* TC stats / hw offload question @ 2019-02-05 19:41 Edward Cree 2019-02-06 2:20 ` Jamal Hadi Salim 0 siblings, 1 reply; 21+ messages in thread From: Edward Cree @ 2019-02-05 19:41 UTC (permalink / raw) To: netdev; +Cc: Jiri Pirko, Jamal Hadi Salim, Cong Wang Regarding TC_CLSFLOWER_STATS, when a filter rule modifies the length of the packet, e.g. by adding a VLAN or encap header, should the bytes counter count the length of the packet _before_ edits (i.e. as seen by the match), or the length after edits? If the latter, what is the correct behaviour when (say) a packet is mirrored as-is but also redirected with encapsulation? The fact that the stats live in the struct tc_action suggests a per- action connection that would imply post-edit length, but then in tcf_exts_dump_stats() we only look at the first action, which seems to imply we really want the pre-edit length. I can't find any kind of doc or spec defining what behaviour is required. -Ed ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: TC stats / hw offload question 2019-02-05 19:41 TC stats / hw offload question Edward Cree @ 2019-02-06 2:20 ` Jamal Hadi Salim 2019-02-08 10:26 ` Edward Cree 2019-04-24 14:05 ` Edward Cree 0 siblings, 2 replies; 21+ messages in thread From: Jamal Hadi Salim @ 2019-02-06 2:20 UTC (permalink / raw) To: Edward Cree, netdev; +Cc: Jiri Pirko, Cong Wang On 2019-02-05 2:41 p.m., Edward Cree wrote: > Regarding TC_CLSFLOWER_STATS, when a filter rule modifies the length of > the packet, e.g. by adding a VLAN or encap header, should the bytes > counter count the length of the packet _before_ edits (i.e. as seen by > the match), or the length after edits? Classifier stats - when they exist - should be on "classified" packets. i.e before edits. Edits are typically done by actions. And even then we dont have counters which indicate the post edit resizing. > If the latter, what is the > correct behaviour when (say) a packet is mirrored as-is but also > redirected with encapsulation? It is that ambiguity that make it hard to maintain a single counter for post-edit size. packet and byte counts are on "entry". > The fact that the stats live in the struct tc_action suggests a per- > action connection that would imply post-edit length, The classifiers dont mod the packets. The actions do. And they maintain stats on the size on "entry" i.e pre-edit. > but then in > tcf_exts_dump_stats() we only look at the first action, which seems to > imply we really want the pre-edit length. > I can't find any kind of doc or spec defining what behaviour is required. > Each action keeps its own counters. If you did something like: tc match using flower blah \ action vlan push tag ... \ action redirect to egress of eth0 And you submited a packet of size x bytes, then the "match" would record x bytes. the "vlan action" would record x bytes. the "redirect" would record size x+vlaninfo bytes the egress of eth0 would recorr x+vlaninfo bytes cheers, jamal > -Ed > ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: TC stats / hw offload question 2019-02-06 2:20 ` Jamal Hadi Salim @ 2019-02-08 10:26 ` Edward Cree 2019-02-09 17:39 ` Jamal Hadi Salim 2019-04-24 14:05 ` Edward Cree 1 sibling, 1 reply; 21+ messages in thread From: Edward Cree @ 2019-02-08 10:26 UTC (permalink / raw) To: Jamal Hadi Salim, netdev; +Cc: Jiri Pirko, Cong Wang On 06/02/19 02:20, Jamal Hadi Salim wrote: > The classifiers dont mod the packets. The actions do. And they > maintain stats on the size on "entry" i.e pre-edit. Thank you for clearing that up. > Each action keeps its own counters. If you did something like: > > tc match using flower blah \ > action vlan push tag ... \ > action redirect to egress of eth0 > > And you submited a packet of size x bytes, > then the "match" would record x bytes. Sorry, where would it record that? I can't find any stats counters on the "match" either in the software path or the offload API. > the "vlan action" would record x bytes. > the "redirect" would record size x+vlaninfo bytes > the egress of eth0 would recorr x+vlaninfo bytes Am I right in thinking that offloaded counters don't do that? As far as I can tell, the drivers with flower offload all use tcf_exts_stats_update() which takes a single 'bytes' count and adds it to all the actions. (Presumably this is pre-edit length.) -Ed ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: TC stats / hw offload question 2019-02-08 10:26 ` Edward Cree @ 2019-02-09 17:39 ` Jamal Hadi Salim 2019-02-11 11:44 ` Edward Cree 0 siblings, 1 reply; 21+ messages in thread From: Jamal Hadi Salim @ 2019-02-09 17:39 UTC (permalink / raw) To: Edward Cree, netdev Cc: Jiri Pirko, Cong Wang, Or Gerlitz, Andy Gospodarek, PJ Waskiewicz, Anjali Singhai Jain, Jakub Kicinski On 2019-02-08 5:26 a.m., Edward Cree wrote: > On 06/02/19 02:20, Jamal Hadi Salim wrote: >> tc match using flower blah \ >> action vlan push tag ... \ >> action redirect to egress of eth0 >> >> And you submited a packet of size x bytes, >> then the "match" would record x bytes. > > Sorry, where would it record that? Sorry - that was hypothetical on my part. We dont record the bytes in the classifier. We do (on some classifiers) record hit counts, so we can see how many times a lookup on a specific filter happened and how many times that lookup resulted in a match. See u32 (or a few others Cong has been updating recently). > I can't find any stats counters on > the "match" either in the software path or the offload API. Hasnt been necessary thus far. Is your end goal to match and count? Most hardware has stats/counters as a separate table. Assuming yours does as well, then if all you want was to just match and accept, then you would add a filter with an accept/ok. i.e something along the lines of: tc match using flower blah \ action ok index 12345 The "action index" field can be used to map to a counter table index in hardware. Note if you dont explicitly specify the index, the kernel (and in this case h/w) issues one for you. >> the "vlan action" would record x bytes. >> the "redirect" would record size x+vlaninfo bytes >> the egress of eth0 would recorr x+vlaninfo bytes > Am I right in thinking that offloaded counters don't do that? As far > as I can tell, the drivers with flower offload all use > tcf_exts_stats_update() which takes a single 'bytes' count and adds > it to all the actions. (Presumably this is pre-edit length.) On the pre/post edit, perhaps some of the hardware owners could chime in +Ccing a few. To the tcf_exts_stats_update(): Note, most people interested in h/w offload will choose to skip sw (explicitly defined when you add a rule). The update synchronizes the hardware stats/counters to the kernel - so when you dump the policies in such a setup you are seeing what is reflected in hardware. cheers, jamal PS: I have to say just keeping one counter is at times insufficient. Example, the policer records how many bytes were seen, not how many bytes were allowed through. For tunnels, it is easy to compute post-edit size because you can easily compute later the added/removed bytes. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: TC stats / hw offload question 2019-02-09 17:39 ` Jamal Hadi Salim @ 2019-02-11 11:44 ` Edward Cree 2019-02-14 12:39 ` Jamal Hadi Salim 0 siblings, 1 reply; 21+ messages in thread From: Edward Cree @ 2019-02-11 11:44 UTC (permalink / raw) To: Jamal Hadi Salim, netdev Cc: Jiri Pirko, Cong Wang, Or Gerlitz, Andy Gospodarek, PJ Waskiewicz, Anjali Singhai Jain, Jakub Kicinski On 09/02/19 17:39, Jamal Hadi Salim wrote: > On 2019-02-08 5:26 a.m., Edward Cree wrote: >> I can't find any stats counters on >> the "match" either in the software path or the offload API. > > Hasnt been necessary thus far. > Is your end goal to match and count? My end goal is to implement TC offload in some hw we're designing here at Solarflare. So I'm trying to determine what hardware is expected/required to do. It might be possible to design our new hw so that we can attach a counter to every action, if that's what TC wants. But since the other vendors don't seem to do that, I wondered if there was a reason, or if perhaps the counter resources (and PCI bw to read them) could be saved if all those separate counters aren't really needed. Right now the design we are considering would only count packets as-matched, i.e. before any edits. That's fine for encap — you can calculate the bytes correction in SW — but not for decap since in principle the length of the RXed outer headers could vary (e.g. you might have IP options there). -Ed ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: TC stats / hw offload question 2019-02-11 11:44 ` Edward Cree @ 2019-02-14 12:39 ` Jamal Hadi Salim 2019-02-14 15:17 ` Andy Gospodarek 2019-02-18 18:56 ` Edward Cree 0 siblings, 2 replies; 21+ messages in thread From: Jamal Hadi Salim @ 2019-02-14 12:39 UTC (permalink / raw) To: Edward Cree, netdev Cc: Jiri Pirko, Cong Wang, Or Gerlitz, Andy Gospodarek, PJ Waskiewicz, Anjali Singhai Jain, Jakub Kicinski On 2019-02-11 6:44 a.m., Edward Cree wrote: >> Hasnt been necessary thus far. >> Is your end goal to match and count? > My end goal is to implement TC offload in some hw we're designing > here at Solarflare. So I'm trying to determine what hardware is > expected/required to do. > It might be possible to design our new hw so that we can attach a > counter to every action, if that's what TC wants. It makes sense to have a counter on every action - even if it is for debugging purposes. The two most basic actions are "drop" or "accept". In TC speak the default action is "classid x:y" which typically is to select a queue or give the flow some identity (one should be able to use the same action on h/w ingress as well to select a rx DMA ring for example, but that seems uncommon). Note, your counters should also be shareable; example, count all the drops in one counter across multiple flows as in the following case where counter index 1 is used. tc flower match foo action drop index 1 tc flower match bar action drop index 1 > But since the > other vendors don't seem to do that, I wondered if there was a > reason, or if perhaps the counter resources (and PCI bw to read > them) could be saved if all those separate counters aren't really > needed. Probably nobody has paid attention or asked as you did. Will let the h/w folks speak for themselves. My understanding based on experience is counters are cheap. Most modern NICs and ASICs have a gazillion of them at their disposal. > Right now the design we are considering would only count > packets as-matched, i.e. before any edits. That's fine for encap > — you can calculate the bytes correction in SW — but not for decap > since in principle the length of the RXed outer headers could > vary (e.g. you might have IP options there). > ok, so not much in terms of other types of actions. But for abstraction sake maybe use "flowid x:y" and have counters associated with that. Or even make this optional and only attach a counter if someone says "action ok" and allow them to specify the counter index (assuming you architecture has an indexed table of counters). cheers, jamal ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: TC stats / hw offload question 2019-02-14 12:39 ` Jamal Hadi Salim @ 2019-02-14 15:17 ` Andy Gospodarek 2019-02-18 18:56 ` Edward Cree 1 sibling, 0 replies; 21+ messages in thread From: Andy Gospodarek @ 2019-02-14 15:17 UTC (permalink / raw) To: Jamal Hadi Salim Cc: Edward Cree, netdev, Jiri Pirko, Cong Wang, Or Gerlitz, PJ Waskiewicz, Anjali Singhai Jain, Jakub Kicinski On Thu, Feb 14, 2019 at 07:39:28AM -0500, Jamal Hadi Salim wrote: > > On 2019-02-11 6:44 a.m., Edward Cree wrote: > > > But since the > > other vendors don't seem to do that, I wondered if there was a > > reason, or if perhaps the counter resources (and PCI bw to read > > them) could be saved if all those separate counters aren't really > > needed. > > Probably nobody has paid attention or asked as you did. That's not totally true, but I'm glad to hear that others are considering it. > Will let the h/w folks speak for themselves. My understanding > based on experience is counters are cheap. Most modern NICs > and ASICs have a gazillion of them at their disposal. Counters can be cheap to implement (though that does not always mean that everyone chooses to add many of them to hardware), but the real cost to them, as Edward points out, is the cost of accessing them an keeping them up to date. If we are concerned about keeping track of flow counters on thousands (or more) flows the cost on the PCI bus can be quite high. We have been looking at a few options to deal with tracking this massive number of counts, but are open to hearing what others feel they would like to see happen to this. Though stats sometimes seem boring to some developers, they are critical and we have been considering whether or not we need to think about different driver or global infra to handle it. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: TC stats / hw offload question 2019-02-14 12:39 ` Jamal Hadi Salim 2019-02-14 15:17 ` Andy Gospodarek @ 2019-02-18 18:56 ` Edward Cree 2019-02-18 19:37 ` Edward Cree 1 sibling, 1 reply; 21+ messages in thread From: Edward Cree @ 2019-02-18 18:56 UTC (permalink / raw) To: Jamal Hadi Salim, netdev Cc: Jiri Pirko, Cong Wang, Or Gerlitz, Andy Gospodarek, PJ Waskiewicz, Anjali Singhai Jain, Jakub Kicinski On 14/02/19 12:39, Jamal Hadi Salim wrote: > On 2019-02-11 6:44 a.m., Edward Cree wrote: >> My end goal is to implement TC offload in some hw we're designing >> here at Solarflare. So I'm trying to determine what hardware is >> expected/required to do. >> It might be possible to design our new hw so that we can attach a >> counter to every action, if that's what TC wants. > > It makes sense to have a counter on every action - even if it is > for debugging purposes. The two most basic actions are "drop" or > "accept". In TC speak the default action is "classid x:y" which > typically is to select a queue or give the flow some identity Perhaps I was insufficiently clear; we're only looking at the switching side of things (e.g. OVS offload) right now; we don't yet have a plan for 'delivery' filters (I imagine we'll probably initially port over our ethtool ntuple filter handling from ef10, though we may end up going down the TC route there). So I think at the moment 'classid' isn't relevant (?) > Note, your counters should also be shareable; example, count all > the drops in one counter across multiple flows as in the following > case where counter index 1 is used. > > tc flower match foo action drop index 1 > tc flower match bar action drop index 1 [...] > allow them to specify > the counter index (assuming you architecture has an indexed table > of counters). Our architecture allocates objects (including counters) and returns opaque handles to them, so we'd need a software table to connect counter index to FW counter ID. Also, sharing counters in hw causes extra work for the driver code that keeps track of which encap actions are getting hit (so it can keep the neighbour entries alive). Maybe summing the shared counters in sw is easier than that, I'm not sure (or maybe encap action counters should just be kept separate from the counters we report to TC). TBH I'm coming to the conclusion that what we should do for the first version of our driver is just to create a counter per rule and report it against the first action (only), and for now ignore the index (or maybe require it to be set to some distinguished value, like 0, to mean 'allocate', so that as a future extension we can support shareable counters). -Ed ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: TC stats / hw offload question 2019-02-18 18:56 ` Edward Cree @ 2019-02-18 19:37 ` Edward Cree 0 siblings, 0 replies; 21+ messages in thread From: Edward Cree @ 2019-02-18 19:37 UTC (permalink / raw) To: Jamal Hadi Salim, netdev Cc: Jiri Pirko, Cong Wang, Or Gerlitz, Andy Gospodarek, PJ Waskiewicz, Anjali Singhai Jain, Jakub Kicinski On 18/02/19 18:56, Edward Cree wrote: > (or > maybe require it to be set to some distinguished value, like 0, to > mean 'allocate', so that as a future extension we can support > shareable counters). Turns out 0 isn't an option, because TC already internally swallows that and allocates its own index (same as if 'index' wasn't specified on cmdline at all), unless we can change TC to pass that 0 on to the driver. -Ed ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: TC stats / hw offload question 2019-02-06 2:20 ` Jamal Hadi Salim 2019-02-08 10:26 ` Edward Cree @ 2019-04-24 14:05 ` Edward Cree 2019-04-24 14:11 ` Pablo Neira Ayuso 1 sibling, 1 reply; 21+ messages in thread From: Edward Cree @ 2019-04-24 14:05 UTC (permalink / raw) To: Jamal Hadi Salim, netdev, Pablo Neira Ayuso; +Cc: Jiri Pirko, Cong Wang On 06/02/2019 02:20, Jamal Hadi Salim wrote: > The classifiers dont mod the packets. The actions do. And they > maintain stats on the size on "entry" i.e pre-edit. > Each action keeps its own counters. If you did something like: > > tc match using flower blah \ > action vlan push tag ... \ > action redirect to egress of eth0 > > And you submited a packet of size x bytes, > then the "match" would record x bytes. > the "vlan action" would record x bytes. > the "redirect" would record size x+vlaninfo bytes > the egress of eth0 would recorr x+vlaninfo bytes So, I had this working for a while, by calling tcf_action_stats_update() directly on the correct element of tc->exts->actions[], instead of using tcf_exts_stats_update(). And this was fine, until I tried to port my code to the 5.1 kernel, with Pablo's flow action infrastructure. On that it's not possible, because there is only flow_stats_update(), which takes a single bytes value for _all_ the actions in the rule. Pablo, was your patch series[1] intended to change the semantics of TC action bytes-counters? What are the proper semantics now? -Ed [1] http://patchwork.ozlabs.org/cover/1035414/ ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: TC stats / hw offload question 2019-04-24 14:05 ` Edward Cree @ 2019-04-24 14:11 ` Pablo Neira Ayuso 2019-04-24 15:03 ` Edward Cree 0 siblings, 1 reply; 21+ messages in thread From: Pablo Neira Ayuso @ 2019-04-24 14:11 UTC (permalink / raw) To: Edward Cree; +Cc: Jamal Hadi Salim, netdev, Jiri Pirko, Cong Wang On Wed, Apr 24, 2019 at 03:05:05PM +0100, Edward Cree wrote: > On 06/02/2019 02:20, Jamal Hadi Salim wrote: > > The classifiers dont mod the packets. The actions do. And they > > maintain stats on the size on "entry" i.e pre-edit. > > Each action keeps its own counters. If you did something like: > > > > tc match using flower blah \ > > action vlan push tag ... \ > > action redirect to egress of eth0 > > > > And you submited a packet of size x bytes, > > then the "match" would record x bytes. > > the "vlan action" would record x bytes. > > the "redirect" would record size x+vlaninfo bytes > > the egress of eth0 would recorr x+vlaninfo bytes > So, I had this working for a while, by calling tcf_action_stats_update() > directly on the correct element of tc->exts->actions[], instead of using > tcf_exts_stats_update(). And this was fine, until I tried to port my > code to the 5.1 kernel, with Pablo's flow action infrastructure. On that > it's not possible, because there is only flow_stats_update(), which takes > a single bytes value for _all_ the actions in the rule. Then, we need to extend the API to allow to update counters per action, no driver in the tree has been supporting this so far. May I have a look at your driver code to infer what we might need? Thanks. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: TC stats / hw offload question 2019-04-24 14:11 ` Pablo Neira Ayuso @ 2019-04-24 15:03 ` Edward Cree 2019-04-25 13:23 ` Edward Cree 0 siblings, 1 reply; 21+ messages in thread From: Edward Cree @ 2019-04-24 15:03 UTC (permalink / raw) To: Pablo Neira Ayuso; +Cc: Jamal Hadi Salim, netdev, Jiri Pirko, Cong Wang On 24/04/2019 15:11, Pablo Neira Ayuso wrote: > On Wed, Apr 24, 2019 at 03:05:05PM +0100, Edward Cree wrote: >> So, I had this working for a while, by calling tcf_action_stats_update() >> directly on the correct element of tc->exts->actions[], instead of using >> tcf_exts_stats_update(). And this was fine, until I tried to port my >> code to the 5.1 kernel, with Pablo's flow action infrastructure. On that >> it's not possible, because there is only flow_stats_update(), which takes >> a single bytes value for _all_ the actions in the rule. > Then, we need to extend the API to allow to update counters per > action, no driver in the tree has been supporting this so far. May I > have a look at your driver code to infer what we might need? > > Thanks. Somewhat abridged and out-of-context, because this is for stuff that's a long way away from being ready to upstream (mumble unreleased hardware), here's the counter-handling part of it. struct efx_tc_action_set { u16 vlan_push:2; u16 vlan_pop:2; /* other action flag bits */ __be16 vlan_tci[2]; /* TCIs for vlan_push */ __be16 vlan_proto[2]; /* Ethertypes for vlan_push */ struct efx_tc_counter_index *count; int count_action_idx; /* Which tc_action uses the counter */ int count_size_delta; /* bytes per packet change */ /* other action metadata and gubbins */ }; static void calculate_count_delta(struct efx_tc_action_set *act) { act->count_size_delta = 0; if (act->vlan_pop & 2) act->count_size_delta -= 4; if (act->vlan_pop & 1) act->count_size_delta -= 4; if (act->vlan_push & 1) act->count_size_delta += 4; if (act->vlan_push & 2) act->count_size_delta += 4; } static int efx_tc_flower_replace(struct efx_nic *efx, struct net_device *net_dev, struct tc_cls_flower_offload *tc) { struct efx_tc_action_set *act; /* parse the match */ tcf_exts_for_each_action(i, a, tc->exts) { if (a->ops && a->ops->stats_update) { /* act is the hw action we're building */ act->count = allocate_a_counter(); act->count_action_idx = i; calculate_count_delta(act); } /* handle the actual action */ /* in the case of 'mirror' we'll stick 'act' on a list and * allocate a new one that starts out the same, to represent * the piped packet. */ } /* insert the rule in hw*/ return ...; } static int efx_tc_flower_stats(struct efx_nic *efx, struct net_device *net_dev, struct tc_cls_flower_offload *tc) { struct efx_tc_action_set *act; struct efx_tc_counter *cnt; u64 packets, bytes; rule = /* lookup the rule */; list_for_each_entry(act, &/*list of rule actions*/, list) if (act->count) { struct tc_action *a; u64 d_packets, d_bytes; cnt = /* get the counter */; /* Report only new pkts/bytes since last time TC asked */ packets = cnt->packets; bytes = cnt->bytes; d_packets = packets - cnt->old_packets; /* Apply corrections for any VLAN pops/pushes before * our count action (but after the HW counter) */ d_bytes = bytes - cnt->old_bytes + act->count_size_delta * d_packets; a = tc->exts->actions[act->count_action_idx]; tcf_action_stats_update(a, d_bytes, d_packets, cnt->touched, true); cnt->old_packets = packets; cnt->old_bytes = bytes; } return 0; } With this code, if for instance the rule is tc match using flower blah \ action mirred egress mirror eth1 \ action vlan push blah \ action mirred egress redirect eth2 and a packed of length 100 bytes is matched, then tc stats show will report: action mirred egress mirror eth1: 1 packets, 100 bytes action vlan push blah 0 packets, 0 bytes /* Doesn't have an ops->stats_update */ action mirred egress redirect eth2 1 packets, 104 bytes which AIUI was the correct semantics, although no drivers upstream did it. TC does have all the information it needs to make those corrections itself (it knows that the mirred eth2 comes after a vlan push), but currently it doesn't do so. So one possible solution would be for flow_stats_update() to take bytes 'as matched', and make the corrections as it goes along. Unfortunately, this doesn't quite work for decap (tunnel_key unset), because the length of the tunnel header is not statically known. (For this reason, our hw counts bytes _after_ decap but _before_ other actions like VLAN push/pop.) Since from TC's perspective tunnel_key is metadata rather than packet data, this works if we are willing to define the counters as not including tunnel headers — that's what I've done currently. Then VLAN push/pop are the only TC actions that alter packet length. It is however a bit unintuitive that in the case of tc filter add dev vxlan0 ... flower blah \ action mirred egress mirror eth1 action tunnel_key unset action mirred egress redirect eth2 both mirreds will show the same (decapped) bytes count, yet eth1 will (at least as I've interpreted it for offload) receive the packet with the encap header still present. So to implement this we could do something like (include/net/pkt_cls.h, written in mail client & not at all tested...) static inline void tcf_exts_stats_update(const struct tcf_exts *exts, u64 bytes, u64 packets, u64 lastuse) { #ifdef CONFIG_NET_CLS_ACT int i, delta = 0; preempt_disable(); for (i = 0; i < exts->nr_actions; i++) { struct tc_action *a = exts->actions[i]; tcf_action_stats_update(a, bytes + packets * delta, packets, lastuse, true); if (is_tcf_vlan(a)) switch (tcf_vlan_action(a)) { case TCA_VLAN_ACT_POP: delta -= 4; break; case TCA_VLAN_ACT_PUSH: delta += 4; break; case TCA_VLAN_ACT_MODIFY: default: break; } } preempt_enable(); #endif } Does that seem reasonable? -Ed ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: TC stats / hw offload question 2019-04-24 15:03 ` Edward Cree @ 2019-04-25 13:23 ` Edward Cree 2019-04-25 22:33 ` Pablo Neira Ayuso 0 siblings, 1 reply; 21+ messages in thread From: Edward Cree @ 2019-04-25 13:23 UTC (permalink / raw) To: Pablo Neira Ayuso; +Cc: Jamal Hadi Salim, netdev, Jiri Pirko, Cong Wang On 24/04/2019 16:03, Edward Cree wrote: > static int efx_tc_flower_replace(struct efx_nic *efx, > struct net_device *net_dev, > struct tc_cls_flower_offload *tc) > { > struct efx_tc_action_set *act; > > /* parse the match */ > > tcf_exts_for_each_action(i, a, tc->exts) { > if (a->ops && a->ops->stats_update) { > /* act is the hw action we're building */ > act->count = allocate_a_counter(); Also, this was actually taking a->tcfa_index, allowing multiple rules to share a counter. The action index doesn't seem to be available in the new flow_offload API. -Ed ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: TC stats / hw offload question 2019-04-25 13:23 ` Edward Cree @ 2019-04-25 22:33 ` Pablo Neira Ayuso 2019-04-26 12:13 ` Edward Cree 0 siblings, 1 reply; 21+ messages in thread From: Pablo Neira Ayuso @ 2019-04-25 22:33 UTC (permalink / raw) To: Edward Cree; +Cc: Jamal Hadi Salim, netdev, Jiri Pirko, Cong Wang On Thu, Apr 25, 2019 at 02:23:08PM +0100, Edward Cree wrote: > On 24/04/2019 16:03, Edward Cree wrote: > > static int efx_tc_flower_replace(struct efx_nic *efx, > > struct net_device *net_dev, > > struct tc_cls_flower_offload *tc) > > { > > struct efx_tc_action_set *act; > > > > /* parse the match */ > > > > tcf_exts_for_each_action(i, a, tc->exts) { > > if (a->ops && a->ops->stats_update) { > > /* act is the hw action we're building */ > > act->count = allocate_a_counter(); > > Also, this was actually taking a->tcfa_index, allowing multiple rules to > share a counter. The action index doesn't seem to be available in the > new flow_offload API. Could you show a bit more code to see how you use a->tcfa_index from your efx_tc_flower_replace()? Thanks. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: TC stats / hw offload question 2019-04-25 22:33 ` Pablo Neira Ayuso @ 2019-04-26 12:13 ` Edward Cree 2019-04-26 12:42 ` Jamal Hadi Salim 2019-04-26 18:49 ` Pablo Neira Ayuso 0 siblings, 2 replies; 21+ messages in thread From: Edward Cree @ 2019-04-26 12:13 UTC (permalink / raw) To: Pablo Neira Ayuso; +Cc: Jamal Hadi Salim, netdev, Jiri Pirko, Cong Wang On 25/04/2019 23:33, Pablo Neira Ayuso wrote: > On Thu, Apr 25, 2019 at 02:23:08PM +0100, Edward Cree wrote: >> On 24/04/2019 16:03, Edward Cree wrote: >>> static int efx_tc_flower_replace(struct efx_nic *efx, >>> struct net_device *net_dev, >>> struct tc_cls_flower_offload *tc) >>> { >>> struct efx_tc_action_set *act; >>> >>> /* parse the match */ >>> >>> tcf_exts_for_each_action(i, a, tc->exts) { >>> if (a->ops && a->ops->stats_update) { >>> /* act is the hw action we're building */ >>> act->count = allocate_a_counter(); >> Also, this was actually taking a->tcfa_index, allowing multiple rules to >> share a counter. The action index doesn't seem to be available in the >> new flow_offload API. > Could you show a bit more code to see how you use a->tcfa_index from > your efx_tc_flower_replace()? > > Thanks. Sure; this block is (still slightly abridged) if (a->ops && a->ops->stats_update) { struct efx_tc_counter_index *ctr; ctr = efx_tc_flower_get_counter_by_index(efx, a->tcfa_index); if (IS_ERR(ctr)) { rc = PTR_ERR(ctr); goto release; } act->count = ctr; act->count_action_idx = i; efx_tc_calculate_count_delta(act); } and we have struct efx_tc_counter_index { u32 tcfa_index; struct rhash_head linkage; refcount_t ref; u32 fw_id; }; const static struct rhashtable_params efx_tc_counter_ht_params = { .key_len = offsetof(struct efx_tc_counter_index, linkage), .key_offset = 0, .head_offset = offsetof(struct efx_tc_counter_index, linkage), }; static struct efx_tc_counter_index *efx_tc_flower_get_counter_by_index( struct efx_nic *efx, u32 idx) { struct efx_tc_counter_index *ctr, *old; long rc; ctr = kzalloc(sizeof(*ctr), GFP_USER); if (!ctr) return ERR_PTR(-ENOMEM); ctr->tcfa_index = idx; old = rhashtable_lookup_get_insert_fast(&efx->tc->counter_ht, &ctr->linkage, efx_tc_counter_ht_params); if (old) { /* don't need our new entry */ kfree(ctr); if (!refcount_inc_not_zero(&old->ref)) return ERR_PTR(-EAGAIN); /* existing entry found */ ctr = old; } else { rc = efx_tc_flower_allocate_counter(efx); if (rc < 0) { rhashtable_remove_fast(&efx->tc->counter_ht, &ctr->linkage, efx_tc_counter_ht_params); kfree(ctr); return ERR_PTR(rc); } ctr->fw_id = rc; refcount_inc(&ctr->ref); } return ctr; } Thus if (and only if) two TC actions have the same tcfa_index, they will share a single counter in the HW. I gathered from a previous conversation with Jamal[1] that that was the correct behaviour: > Note, your counters should also be shareable; example, count all > the drops in one counter across multiple flows as in the following > case where counter index 1 is used. > > tc flower match foo action drop index 1 > tc flower match bar action drop index 1 -Ed [1]: https://www.spinics.net/lists/netdev/msg551448.html ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: TC stats / hw offload question 2019-04-26 12:13 ` Edward Cree @ 2019-04-26 12:42 ` Jamal Hadi Salim 2019-04-26 18:49 ` Pablo Neira Ayuso 1 sibling, 0 replies; 21+ messages in thread From: Jamal Hadi Salim @ 2019-04-26 12:42 UTC (permalink / raw) To: Edward Cree, Pablo Neira Ayuso; +Cc: netdev, Jiri Pirko, Cong Wang On 2019-04-26 8:13 a.m., Edward Cree wrote: > Sure; this block is (still slightly abridged) > > if (a->ops && a->ops->stats_update) { > struct efx_tc_counter_index *ctr; > > ctr = efx_tc_flower_get_counter_by_index(efx, a->tcfa_index); > if (IS_ERR(ctr)) { > rc = PTR_ERR(ctr); > goto release; > } > act->count = ctr; > act->count_action_idx = i; > efx_tc_calculate_count_delta(act); > } > > and we have > > struct efx_tc_counter_index { > u32 tcfa_index; > struct rhash_head linkage; > refcount_t ref; > u32 fw_id; > }; > > const static struct rhashtable_params efx_tc_counter_ht_params = { > .key_len = offsetof(struct efx_tc_counter_index, linkage), > .key_offset = 0, > .head_offset = offsetof(struct efx_tc_counter_index, linkage), > }; > > static struct efx_tc_counter_index *efx_tc_flower_get_counter_by_index( > struct efx_nic *efx, u32 idx) > { > struct efx_tc_counter_index *ctr, *old; > long rc; > > ctr = kzalloc(sizeof(*ctr), GFP_USER); > if (!ctr) > return ERR_PTR(-ENOMEM); > ctr->tcfa_index = idx; > old = rhashtable_lookup_get_insert_fast(&efx->tc->counter_ht, > &ctr->linkage, > efx_tc_counter_ht_params); > if (old) { > /* don't need our new entry */ > kfree(ctr); > if (!refcount_inc_not_zero(&old->ref)) > return ERR_PTR(-EAGAIN); > /* existing entry found */ > ctr = old; > } else { > rc = efx_tc_flower_allocate_counter(efx); > if (rc < 0) { > rhashtable_remove_fast(&efx->tc->counter_ht, > &ctr->linkage, > efx_tc_counter_ht_params); > kfree(ctr); > return ERR_PTR(rc); > } > ctr->fw_id = rc; > refcount_inc(&ctr->ref); > } > return ctr; > } > > Thus if (and only if) two TC actions have the same tcfa_index, they will > share a single counter in the HW. > I gathered from a previous conversation with Jamal[1] that that was the > correct behaviour: Yes, this is expected behavior. Meters/policers as well used indices to indicate sharing. cheers, jamal ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: TC stats / hw offload question 2019-04-26 12:13 ` Edward Cree 2019-04-26 12:42 ` Jamal Hadi Salim @ 2019-04-26 18:49 ` Pablo Neira Ayuso 2019-04-29 14:11 ` Edward Cree 1 sibling, 1 reply; 21+ messages in thread From: Pablo Neira Ayuso @ 2019-04-26 18:49 UTC (permalink / raw) To: Edward Cree; +Cc: Jamal Hadi Salim, netdev, Jiri Pirko, Cong Wang On Fri, Apr 26, 2019 at 01:13:41PM +0100, Edward Cree wrote: > On 25/04/2019 23:33, Pablo Neira Ayuso wrote: > > On Thu, Apr 25, 2019 at 02:23:08PM +0100, Edward Cree wrote: > >> On 24/04/2019 16:03, Edward Cree wrote: > >>> static int efx_tc_flower_replace(struct efx_nic *efx, > >>> struct net_device *net_dev, > >>> struct tc_cls_flower_offload *tc) > >>> { > >>> struct efx_tc_action_set *act; > >>> > >>> /* parse the match */ > >>> > >>> tcf_exts_for_each_action(i, a, tc->exts) { > >>> if (a->ops && a->ops->stats_update) { > >>> /* act is the hw action we're building */ > >>> act->count = allocate_a_counter(); > >> Also, this was actually taking a->tcfa_index, allowing multiple rules to > >> share a counter. The action index doesn't seem to be available in the > >> new flow_offload API. > > Could you show a bit more code to see how you use a->tcfa_index from > > your efx_tc_flower_replace()? > > > > Thanks. > Sure; this block is (still slightly abridged) > > if (a->ops && a->ops->stats_update) { > struct efx_tc_counter_index *ctr; > > ctr = efx_tc_flower_get_counter_by_index(efx, a->tcfa_index); > if (IS_ERR(ctr)) { > rc = PTR_ERR(ctr); > goto release; > } > act->count = ctr; > act->count_action_idx = i; > efx_tc_calculate_count_delta(act); > } > > and we have > > struct efx_tc_counter_index { > u32 tcfa_index; > struct rhash_head linkage; > refcount_t ref; > u32 fw_id; > }; > > const static struct rhashtable_params efx_tc_counter_ht_params = { > .key_len = offsetof(struct efx_tc_counter_index, linkage), > .key_offset = 0, > .head_offset = offsetof(struct efx_tc_counter_index, linkage), > }; > > static struct efx_tc_counter_index *efx_tc_flower_get_counter_by_index( > struct efx_nic *efx, u32 idx) > { > struct efx_tc_counter_index *ctr, *old; > long rc; > > ctr = kzalloc(sizeof(*ctr), GFP_USER); > if (!ctr) > return ERR_PTR(-ENOMEM); > ctr->tcfa_index = idx; > old = rhashtable_lookup_get_insert_fast(&efx->tc->counter_ht, > &ctr->linkage, > efx_tc_counter_ht_params); > if (old) { > /* don't need our new entry */ > kfree(ctr); > if (!refcount_inc_not_zero(&old->ref)) > return ERR_PTR(-EAGAIN); > /* existing entry found */ > ctr = old; > } else { > rc = efx_tc_flower_allocate_counter(efx); > if (rc < 0) { > rhashtable_remove_fast(&efx->tc->counter_ht, > &ctr->linkage, > efx_tc_counter_ht_params); > kfree(ctr); > return ERR_PTR(rc); > } > ctr->fw_id = rc; > refcount_inc(&ctr->ref); > } > return ctr; > } > > Thus if (and only if) two TC actions have the same tcfa_index, they will > share a single counter in the HW. > I gathered from a previous conversation with Jamal[1] that that was the > correct behaviour: > > Note, your counters should also be shareable; example, count all > > the drops in one counter across multiple flows as in the following > > case where counter index 1 is used. > > > > tc flower match foo action drop index 1 > > tc flower match bar action drop index 1 The flow_action_entry structure needs a new 'counter_index' field to store this. The tc_setup_flow_action() function needs to be updated for this for the FLOW_ACTION_{ACCEPT,DROP,REDIRECT,MIRRED} cases to set this entry->counter_index field to tcfa_index, so the driver has access to this. Thanks. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: TC stats / hw offload question 2019-04-26 18:49 ` Pablo Neira Ayuso @ 2019-04-29 14:11 ` Edward Cree 2019-04-29 15:21 ` Pablo Neira Ayuso 0 siblings, 1 reply; 21+ messages in thread From: Edward Cree @ 2019-04-29 14:11 UTC (permalink / raw) To: Pablo Neira Ayuso; +Cc: Jamal Hadi Salim, netdev, Jiri Pirko, Cong Wang On 26/04/2019 19:49, Pablo Neira Ayuso wrote: > On Fri, Apr 26, 2019 at 01:13:41PM +0100, Edward Cree wrote: >> Thus if (and only if) two TC actions have the same tcfa_index, they will >> share a single counter in the HW. >> I gathered from a previous conversation with Jamal[1] that that was the >> correct behaviour: >>> Note, your counters should also be shareable; example, count all >>> the drops in one counter across multiple flows as in the following >>> case where counter index 1 is used. >>> >>> tc flower match foo action drop index 1 >>> tc flower match bar action drop index 1 > The flow_action_entry structure needs a new 'counter_index' field to > store this. The tc_setup_flow_action() function needs to be updated > for this for the FLOW_ACTION_{ACCEPT,DROP,REDIRECT,MIRRED} cases to > set this entry->counter_index field to tcfa_index, so the driver has > access to this. Hmm, I'm still not sure this solves everything. Before, we could write tc flower match foo \ action mirred egress mirror eth1 index 1 \ action mirred egress redirect eth2 index 2 and have two distinct HW counters (one of which might e.g. be shared with another rule). But when reading those counters, under fl_hw_update_stats(), the driver only gets to return one set of flow stats for both actions. Previously, the driver's TC_CLSFLOWER_STATS handler was updating the action stats directly, so was able to do something different for each action, but that's not possible in 5.1. At stats gathering time, the driver doesn't even have access to anything that's per-action and thus could have a flow_stats member shoved in it. AFAICT, the only reason this isn't a regression is that existing drivers didn't implement the old semantics correctly. This is a bit of a mess; the best idea I've got is for the TC_CLSFLOWER_STATS call to include a tcfa_index. Then the driver returns counter stats for that index, and tcf_exts_stats_update() only updates those actions whose index matches. But then fl_hw_update_stats() would have to iterate over all the indices in f->exts. What do you think? -Ed ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: TC stats / hw offload question 2019-04-29 14:11 ` Edward Cree @ 2019-04-29 15:21 ` Pablo Neira Ayuso 2019-04-29 16:25 ` Edward Cree 0 siblings, 1 reply; 21+ messages in thread From: Pablo Neira Ayuso @ 2019-04-29 15:21 UTC (permalink / raw) To: Edward Cree; +Cc: Jamal Hadi Salim, netdev, Jiri Pirko, Cong Wang On Mon, Apr 29, 2019 at 03:11:06PM +0100, Edward Cree wrote: > On 26/04/2019 19:49, Pablo Neira Ayuso wrote: > > On Fri, Apr 26, 2019 at 01:13:41PM +0100, Edward Cree wrote: > >> Thus if (and only if) two TC actions have the same tcfa_index, they will > >> share a single counter in the HW. > >> I gathered from a previous conversation with Jamal[1] that that was the > >> correct behaviour: > >>> Note, your counters should also be shareable; example, count all > >>> the drops in one counter across multiple flows as in the following > >>> case where counter index 1 is used. > >>> > >>> tc flower match foo action drop index 1 > >>> tc flower match bar action drop index 1 > > The flow_action_entry structure needs a new 'counter_index' field to > > store this. The tc_setup_flow_action() function needs to be updated > > for this for the FLOW_ACTION_{ACCEPT,DROP,REDIRECT,MIRRED} cases to > > set this entry->counter_index field to tcfa_index, so the driver has > > access to this. > Hmm, I'm still not sure this solves everything. > Before, we could write > tc flower match foo \ > action mirred egress mirror eth1 index 1 \ > action mirred egress redirect eth2 index 2 > and have two distinct HW counters (one of which might e.g. be shared > with another rule). But when reading those counters, under > fl_hw_update_stats(), the driver only gets to return one set of flow > stats for both actions. > Previously, the driver's TC_CLSFLOWER_STATS handler was updating the > action stats directly, so was able to do something different for each > action, but that's not possible in 5.1. At stats gathering time, the > driver doesn't even have access to anything that's per-action and > thus could have a flow_stats member shoved in it. > AFAICT, the only reason this isn't a regression is that existing > drivers didn't implement the old semantics correctly. > > This is a bit of a mess; the best idea I've got is for the > TC_CLSFLOWER_STATS call to include a tcfa_index. Then the driver > returns counter stats for that index, and tcf_exts_stats_update() > only updates those actions whose index matches. But then > fl_hw_update_stats() would have to iterate over all the indices in > f->exts. What do you think? You could extend struct flow_stats to pass an array of stats to the driver, including one stats per action and the counter index. Then, tcf_exts_stats_update() uses this array of stats to update per-action stats. struct flow_action_stats { u32 counter_index; u64 pkts; u64 bytes; u64 lastused; }; struct flow_stats { struct flow_action_stats *stats[]; u32 num_actions; }; As you mentioned, no driver supports for tcfa_index so far, probably it would be a good idea to return -EOPNOTSUPP in such case by now. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: TC stats / hw offload question 2019-04-29 15:21 ` Pablo Neira Ayuso @ 2019-04-29 16:25 ` Edward Cree 2019-04-29 19:14 ` Pablo Neira Ayuso 0 siblings, 1 reply; 21+ messages in thread From: Edward Cree @ 2019-04-29 16:25 UTC (permalink / raw) To: Pablo Neira Ayuso; +Cc: Jamal Hadi Salim, netdev, Jiri Pirko, Cong Wang On 29/04/2019 16:21, Pablo Neira Ayuso wrote: > On Mon, Apr 29, 2019 at 03:11:06PM +0100, Edward Cree wrote: >> This is a bit of a mess; the best idea I've got is for the >> TC_CLSFLOWER_STATS call to include a tcfa_index. Then the driver >> returns counter stats for that index, and tcf_exts_stats_update() >> only updates those actions whose index matches. But then >> fl_hw_update_stats() would have to iterate over all the indices in >> f->exts. What do you think? > You could extend struct flow_stats to pass an array of stats to the > driver, including one stats per action and the counter index. Then, > tcf_exts_stats_update() uses this array of stats to update per-action > stats. Yes, but that means allocating the flow_stats.stats array each time; I'd rather avoid memory allocation unless it's necessary. As long as we can move the preempt_disable() inside the loop that's currently in tcf_exts_stats_update() (i.e. only disable pre-emption across each individual call to tcf_action_stats_update()) I think we can. I think I prefer my approach (ask for one tcfa_index at a time); but unmodified drivers that don't look at the passed index would return zeroes for actions after the first, so we'll need some way to handle those drivers separately (e.g. one tc_setup_cb_call with "answer this one if you don't do indices" and a bunch more with specified index values). I think that requires much less change to the existing drivers than putting an array back in the API, and keeps as much of the work as possible in the core where it won't have to be replicated in every driver. I'll put an RFC patch together soonish if no objections. -Ed ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: TC stats / hw offload question 2019-04-29 16:25 ` Edward Cree @ 2019-04-29 19:14 ` Pablo Neira Ayuso 0 siblings, 0 replies; 21+ messages in thread From: Pablo Neira Ayuso @ 2019-04-29 19:14 UTC (permalink / raw) To: Edward Cree; +Cc: Jamal Hadi Salim, netdev, Jiri Pirko, Cong Wang On Mon, Apr 29, 2019 at 05:25:10PM +0100, Edward Cree wrote: > On 29/04/2019 16:21, Pablo Neira Ayuso wrote: > > On Mon, Apr 29, 2019 at 03:11:06PM +0100, Edward Cree wrote: > >> This is a bit of a mess; the best idea I've got is for the > >> TC_CLSFLOWER_STATS call to include a tcfa_index. Then the driver > >> returns counter stats for that index, and tcf_exts_stats_update() > >> only updates those actions whose index matches. But then > >> fl_hw_update_stats() would have to iterate over all the indices in > >> f->exts. What do you think? > > You could extend struct flow_stats to pass an array of stats to the > > driver, including one stats per action and the counter index. Then, > > tcf_exts_stats_update() uses this array of stats to update per-action > > stats. > Yes, but that means allocating the flow_stats.stats array each time; We use the stack to attach a reasonable size array, eg. 16 actions, otherwise fall back to kmalloc(). I haven't seen any driver in the tree supporting more than that so far. > I'd rather avoid memory allocation unless it's necessary. As long as > we can move the preempt_disable() inside the loop that's currently in > tcf_exts_stats_update() (i.e. only disable pre-emption across each > individual call to tcf_action_stats_update()) I think we can. > I think I prefer my approach (ask for one tcfa_index at a time); but > unmodified drivers that don't look at the passed index would return > zeroes for actions after the first, so we'll need some way to handle > those drivers separately (e.g. one tc_setup_cb_call with "answer > this one if you don't do indices" and a bunch more with specified > index values). I think that requires much less change to the > existing drivers than putting an array back in the API, and keeps as > much of the work as possible in the core where it won't have to be > replicated in every driver. That's all right. This chunk update will not be particularly large, so we can change it anytime in the future. > I'll put an RFC patch together soonish if no objections. Sure, thanks. ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2019-04-29 19:14 UTC | newest] Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-02-05 19:41 TC stats / hw offload question Edward Cree 2019-02-06 2:20 ` Jamal Hadi Salim 2019-02-08 10:26 ` Edward Cree 2019-02-09 17:39 ` Jamal Hadi Salim 2019-02-11 11:44 ` Edward Cree 2019-02-14 12:39 ` Jamal Hadi Salim 2019-02-14 15:17 ` Andy Gospodarek 2019-02-18 18:56 ` Edward Cree 2019-02-18 19:37 ` Edward Cree 2019-04-24 14:05 ` Edward Cree 2019-04-24 14:11 ` Pablo Neira Ayuso 2019-04-24 15:03 ` Edward Cree 2019-04-25 13:23 ` Edward Cree 2019-04-25 22:33 ` Pablo Neira Ayuso 2019-04-26 12:13 ` Edward Cree 2019-04-26 12:42 ` Jamal Hadi Salim 2019-04-26 18:49 ` Pablo Neira Ayuso 2019-04-29 14:11 ` Edward Cree 2019-04-29 15:21 ` Pablo Neira Ayuso 2019-04-29 16:25 ` Edward Cree 2019-04-29 19:14 ` Pablo Neira Ayuso
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).