All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Fastabend, John R" <john.fastabend@gmail.com>
To: Jamal Hadi Salim <jhs@mojatatu.com>, Or Gerlitz <ogerlitz@mellanox.com>
Cc: amir@vadai.me, jiri@resnulli.us, jeffrey.t.kirsher@intel.com,
	netdev@vger.kernel.org, davem@davemloft.net
Subject: Re: [net-next PATCH 0/7] tc offload for cls_u32 on ixgbe
Date: Tue, 9 Feb 2016 03:24:27 -0800	[thread overview]
Message-ID: <56B9CC6B.6050509@gmail.com> (raw)
In-Reply-To: <56B34E35.3020307@mojatatu.com>

On 2/4/2016 5:12 AM, Jamal Hadi Salim wrote:
> 
> On 16-02-03 01:48 PM, Fastabend, John R wrote:
> 
> BTW: For the record John, I empathize with you that we need to
> move. Please have patience - we are close; lets just get this resolved
> in Seville. I like your patches a lot and would love to just have
> your patches pushed in, but the challenges with community is being able
> to reach some middle ground. We are not as bad as some of the standards
> organizations. I am sure we'll get this resolved by end of next week
> if not, I am %100 in agreement some form of your patches (And Amir's
> need to go in and then we can refactor as needed)

Agreed although I'm a bit worried we are starting to talk about a
single hardware IR. This discussion has always failed in my experience.

> 
>>> 1) "priorities" for filters and some form of "index" for actions is
>>> is needed. I think index (which tends to be a 32 bit value is what
>>> Amir's patches refered to as "cookie" - or at least some hardware
>>> can be used to query the action with). Priorities maybe implicit in
>>> the order in which they are added. And th idea of appending vs
>>> exclusivity vs replace (which  netlink already supports)
>>> is important to worry about (TCAMS tend to assume an append mode
>>> for example).
>>
>> The code denotes add/del/replace already. I'm not sure why a TCAM
>> would assume an append mode but OK maybe that is some API you have
>> the APIs I use don't have these semantics.
>>
> 
> Basically most hardware (or i should say driver implementations of
> mostly TCAMS) allow you to add exactly the same filter as many times
> as you want. They dont really look at what you want to filter on
> and then scream "conflict". IOW, you (user) are responsible for
> conflict resolution at the filter level. The driver sees this blob
> and requests for some index/key from the hardware then just adds it.
> You can then use this key/index to delete/replace etc.
> This is what i meant by "append" mode.
> However if a classifier implementation cares about filter ambiguity
> resolution, then priorities are used. We need to worry about the
> bigger picture.
> 

Sure in other classifiers its used but its not needed in the set I
planned to added it later.

> 
>> For this series using cls_u32 the handle gives you everything you need
>> to put entries in the right table and row. Namely the ht # and order #
>> from 'tc'.
> 
> True - but with a caveat. There are only 2^12 max tables you can
> have for example and up to 2^12 filters per bucket etc.
> 

This is a software limitation as well right? If it hasn't showed up
as a limitation on the software side why would it be an issue here?
Do you have more than 2^12 tables on your devices? If so I guess we
can tack on another 32bits somewhere.

>> Take a look at u32_change and u32_classify its the handle
>> that places the filter into the list and the handle that is matched in
>> classify. We should place the filters in the hardware in the same order
>> that is used by u32_change.
>>
> 
> I can see some parallels, but:
> The nodeid in itself is insufficent for two reasons:
> You cant have more than 2^12 filters per bucket;
> and the nodeid then takes two meanings: a) it is an id
> b) it specifies the order in which things are looked up.
> 
> I think you need to take the u32 address and map it to something in your
> hardware. But at the same time it is important to have the abstraction
> closely emulate your hardware.
> 

IMO the hardware/interface must preserve the same ordering of
filters/hash_Tables/etc. How it does that mapping should be
a driver concern and it can always abort if it fails.

>> Also ran a few tests and can't see how priority works in u32 maybe you
>> can shed some light but as best I can tell it doesn't have any effect
>> on rule execution.
>>
> 
> True.
> u32 doesnt care because it will give you a nodeid if you dont specify
> one. i.e conflict resolution is mapped to you not specifying exactly
> the same ht:bkt:nodeid more than once. And if you will let the
> kernel do it for you (as i am assumming you are saying your hardware
> will) then no need.

Yep. Faithfully offloading u32 here not changing anything except
I do have to abort on some cases with the simpler devices. fm10k for
example can model hash nodes with divisors > 1.

> 
>>>
>>> 2) I like the u32 approach where it makes sense; but sometimes it
>>> doesnt make sense from a usability pov. I work with some ASICs
>>> that have 10 tuples that are  fixed. Yes, a user can describe a policy
>>> with u32 but flower would be more  usable say with flower (both
>>> programmatic and cli)
>>
>> Sure so create a set of offload hooks for flower we don't need only
>> one hardware classifier any more than we would like a single software
>> classifiers.
> 
> 
> Glad to hear that.
> I was a little concerned that despite my love for u32 it was
> going to be _the_ classifier. It doesnt fit for all offload cases
> and sometimes it is because of human operators (the 10 tuple
> hardware classifier i mentioned earlier).
> BTW: Classifier in this case is very wide ranging (a regex hardware
> offload for example qualifies).

My issue is we can map flower onto u32 that is fine and u32 onto
bpf. But we lose a lot of the power of each classifier when we
do this. flower for example is nice because of its simplicity
presumably this translates into faster updates, u32 is great because
we get full parse graph support and hash tables, ebpf is the biggest
beast of all and lets us load arbitrary functions into the device.
All are nice in their own right.

> 
>>
>> Again I'm trying to faithfully implement what we have in software
>> and load that into the hardware. The handle today gives ingress/egres
>> hook. If you want an all ports hook we should add it to 'tc' software
>> first and then push that to the hardware not create magic hardware
>> bits. See I've drank the cool aid software first than hardware.
>>
> 
> ;-> No disagreement. It felt like a small sensible
> change - thats why i suggested it.
>

Yep its in the git log if we can get past this initial series.


>>> 4) Why are we forsaking switchdev John?
>>> This is certainly re-usable beyond NICs and SRIOV.
>>>
>>
>> Sure and switchdev can use it just like they use fdb_add and friends.
>> I just don't want to require switchdev infrastructure on things that
>> really are not switches. I think Amir indicated he would take a try
>> at the switchdev integration. If not I'm willing to do it but it
>> doesn't block this series in any way imo.
>>
> 
> Ok. Makes sense.
> 

Great!

>>> 5)What happened to being both able to hardware and/or software?
>>
>> Follow up patch once we get the basic infrastructure in place with
>> the big feature flag bit. I have a patch I'm testing for this now
>> but again I want to move in logical and somewhat minimal sets.
>>
> 
> Sounds sensible.
> 
>>>
>>> Anyways, I think Seville would be a blast! Come one, come all.
>>>
>>
>> I'll be there but lets be sure to follow up with this online I
>> know folks are following this who wont be at Seville and I don't
>> see any reason to block these patches and stop the thread for a
>> week or more.
>>
> 
> I really dont see much of a blocker.

Perfect hopefully it didn't get thrashed on too much last couple
days. I'll be in Seville in a couple hours!

> 
> cheers,
> jamal

  reply	other threads:[~2016-02-09 11:24 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-03  9:27 [net-next PATCH 0/7] tc offload for cls_u32 on ixgbe John Fastabend
2016-02-03  9:27 ` [net-next PATCH 1/7] net: rework ndo tc op to consume additional qdisc handle parameter John Fastabend
2016-02-03  9:58   ` kbuild test robot
2016-02-03  9:59   ` kbuild test robot
2016-02-03 11:44   ` kbuild test robot
2016-02-03  9:28 ` [net-next PATCH 2/7] net: rework setup_tc ndo op to consume general tc operand John Fastabend
2016-02-03  9:28 ` [net-next PATCH 3/7] net: sched: add cls_u32 offload hooks for netdevs John Fastabend
2016-02-03 10:14   ` kbuild test robot
2016-02-04 13:18   ` Amir Vadai"
2016-02-09 11:09     ` Fastabend, John R
2016-02-03  9:28 ` [net-next PATCH 4/7] net: add tc offload feature flag John Fastabend
2016-02-03  9:29 ` [net-next PATCH 5/7] net: tc: helper functions to query action types John Fastabend
2016-02-03  9:29 ` [net-next PATCH 6/7] net: ixgbe: add minimal parser details for ixgbe John Fastabend
2016-02-03  9:29 ` [net-next PATCH 7/7] net: ixgbe: add support for tc_u32 offload John Fastabend
2016-02-03 10:07   ` Amir Vadai"
2016-02-03 10:26     ` John Fastabend
2016-02-03 12:46       ` Jamal Hadi Salim
2016-02-03 19:02         ` Fastabend, John R
2016-02-09 11:30         ` Fastabend, John R
2016-02-04  7:30   ` Amir Vadai"
2016-02-04  8:23     ` Fastabend, John R
2016-02-04 12:12       ` Amir Vadai"
2016-02-09 11:27         ` Fastabend, John R
2016-02-03 10:11 ` [net-next PATCH 0/7] tc offload for cls_u32 on ixgbe Amir Vadai"
2016-02-03 10:21   ` John Fastabend
2016-02-03 10:31     ` Or Gerlitz
2016-02-03 12:21       ` Jamal Hadi Salim
2016-02-03 18:48         ` Fastabend, John R
2016-02-04 13:12           ` Jamal Hadi Salim
2016-02-09 11:24             ` Fastabend, John R [this message]
2016-02-09 12:20               ` Jamal Hadi Salim
2016-02-04  9:16 ` Jiri Pirko
2016-02-04 23:19   ` Pablo Neira Ayuso
2016-02-09 11:06     ` Fastabend, John R

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56B9CC6B.6050509@gmail.com \
    --to=john.fastabend@gmail.com \
    --cc=amir@vadai.me \
    --cc=davem@davemloft.net \
    --cc=jeffrey.t.kirsher@intel.com \
    --cc=jhs@mojatatu.com \
    --cc=jiri@resnulli.us \
    --cc=netdev@vger.kernel.org \
    --cc=ogerlitz@mellanox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.