All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jiri Pirko <jiri@resnulli.us>
To: John Fastabend <john.fastabend@gmail.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>,
	Saeed Mahameed <saeedm@mellanox.com>,
	"David S. Miller" <davem@davemloft.net>,
	netdev@vger.kernel.org, Hadar Hen-Zion <hadarh@mellanox.com>,
	Jiri Pirko <jiri@mellanox.com>,
	Andy Gospodarek <gospo@cumulusnetworks.com>,
	Jesse Brandeburg <jesse.brandeburg@intel.com>,
	John Fastabend <john.r.fastabend@intel.com>,
	Ido Schimmel <idosch@mellanox.com>
Subject: Re: [PATCH net-next 08/16] net/devlink: Add E-Switch mode control
Date: Tue, 28 Jun 2016 20:46:55 +0200	[thread overview]
Message-ID: <20160628184655.GB2089@nanopsycho.orion> (raw)
In-Reply-To: <5772B18A.9000300@gmail.com>

Tue, Jun 28, 2016 at 07:19:06PM CEST, john.fastabend@gmail.com wrote:
>On 16-06-28 09:19 AM, John Fastabend wrote:
>> On 16-06-28 03:25 AM, Or Gerlitz wrote:
>>> On 6/28/2016 8:57 AM, John Fastabend wrote:
>>>> On 16-06-27 09:07 AM, Saeed Mahameed wrote:
>>>>> Add the commands to set and show the mode of SRIOV E-Switch, two
>>>>> modes are supported:
>>>>>
>>>>> * legacy   : operating in the "old" L2 based mode (DMAC --> VF vport)
>>>>> * offloads : offloading SW rules/policy (e.g Bridge/FDB or TC/Flows
>>>>> based) set by the host OS
>>>>>
>>>>> Nice work overall also I really appreciated that the core networking
>>>>> interfaces appear to able to support this without any change.
>>>
>>> thanks..
>>>
>>>>> On this patch though do we really need modes like this? My concern with
>>>>> modes is two fold. One its another knob that some controller will have
>>>>> to get right which I would prefer to avoid. And two I suspect switching
>>>>> between the two modes flushes the tables or leaves them in some
>>>>> unexpected state? At least I can't figure out what the expected should
>>>>> be off-hand.
>>>
>>> Re the 1st concern (another knob), I think we do want that, see below
>>>
>>> Re the 2nd concern, I will re-read the cover letter and change logs and
>>> if needed clarify/improve: the transition is clean! When you are moving
>>> from legacy to offloads or the other way around, nothing is left in
>>> unexpected state,  all HW forwarding tables as filled by the current
>>> mode are flushed and next they are set as needed for the new mode.
>>>
>> 
>> OK if I had read the entire patch series maybe I would have caught this
>> :)
>> 
>>>>> Could we instead continue to use the "legacy" mode by default by just
>>>>> populating the fdb table correctly and then if users want to enable
>>>>> the "offloads" mode they can modify the fdb tables by deleting entries
>>>>> or adding them or just extending the dmac/vf mapping via 'tc'. This
>>>>> would seem natural to me. The flooding rules in fdb might need to be
>>>>> exposed a bit more cleanly to get the right default flooding behavior
>>>>> etc. But to me at least this would be much cleaner. Everything will be
>>>>> nicely defined and we wont have issues with drivers doing slightly
>>>>> and subtle different defaults between legacy/offload and the transitions
>>>>> between the states or on resets or etc. If users need to discover the
>>>>> current configuration then they just query fdb, query tc, and the state
>>>>> is known no need for any magic toggle switch as best I can see.
>>>
>>>
>>> Few comments here:
>>>
>>> Each mode has it's own way of the driver doing setup for the HW tables
>>> and how population of the HW tables is done.
>> 
>> hmm so in the hardware I have there is actually a l2 table and various
>> other tables so I don't have any issue with doing table setup. I would
>> like to see a table_create/table_delete/table_show devlink commands at
>> some point though but I'm not there yet. This would allow users to
>> optimize the table slices if they cared to. But that is future work
>> IMO. Certainly not needed in this series at least. If you want I can
>> show you a patch I had for this against rocker but it was before devlink
>> so it would need some porting.
>> 
>>>
>>> The offloads mode needs to create a black hole miss rule and
>>> send-to-vport rules and create the tables so they can contain later
>>> rules set by the kernel in a way which is HW/driver dependent.
>> 
>> Agreed a black hole miss rule needs to be applied but rather than apply
>> it automatically with some toggle I would prefer to just add a 'tc' rule
>> for this. Or alternatively it can be added by configuring flooding
>> ports so that only a single port is in the flooding mode. This could
>> all be done via 'bridge fdb ...' and 'bridge link ...' today I believe.
>> Then the user defines the state and not the driver writer. It really is
>> cleaner in my opinion.
>> 
>> One oddball case I have is if I have two PF functions behind a single
>> network facing port. Yes its a bit strange but in this case its nice to
>> pick which host facing PF to flood on vs the driver picking one.
>> 
>> And send-to-vport rules I'm not entirely clear on what these actually
>> are used for. Is this a rule to match packets sent from a VF representer
>> netdev to the actual VF pcie device? If this is the case its seems to
>> me that any packet sent on a VF representer should be sent to the VF
>> directly and these rules can be created when the VF is created. Or did
>> you mean some other rule by this?
>> 
>>>
>>> The legacy mode creates the tables differently and populates them later
>>> with rule set by
>>> the driver and not the kernel.
>>>
>>> Even if we put the different table setup issue a side, I don't think it
>>> would be correct for bridge/tc to remove rules they didn't add, which is
>>> needed under your proposal when moving from legacy type rules to
>>> offloads mode. Querying is problematic too, since legacy could (and
>>> does) involve some default rules set by the FW, e.g that deals with
>>> outer world (== not belonging to VM on this host) MACs which are
>>> invisible to the driver.
>> 
>> But even legacy mode should report the correct fdb table and setup.
>> I don't think querying should be a problem if the driver reports the
>> configuration correctly. This allows us visibility into the driver
>> default case so we don't have to guess what driver X writer implemented.
>> 
>>>
>>> That legacy was here and we can't avoid handling it properly for which
>>> this knob is needed. Note that a vendor can choose to put their default
>>> to be offloads, hopefully over time, we will all go there :)
>>>
>> 
>> But you can come up in legacy mode and report it via the existing
>> mechanisms 'tc', 'bridge', etc. and then users can transition to any
>> mode they like using the tools.
>> 
>> I really don't think the switch here is necessary if you implement the
>> bridge hooks and tc hooks. cls_u32 can handle this for example and I
>> would expect flower can as well if you want to do mgmt via flow based
>> tc commands. And the bridge tool has the attributes for per port
>> flooding but not sure off-hand if its packed into the msg sent to the
>> driver. But we could fix that fairly easily in another patch series if
>> needed.
>> 
>
>Actually with a bit more thought it might be nice to have a
>flag to enable/disable creation of vf netdev representer in case it
>somehow causes issues with existing software. We typically
>enable/disable features with ethtool feature flags though not via
>devlink so I think it would fit better as an ethtool flag same as
>all the other hardware features.

This is not a property of a netdevice, but a devlink device. That should
be a handle of creating/not creating representors. And I think that what
this patch is doing serves that purpose as well. For legacy mode, the
representors are not created, for offload/switchdev mode they are
created.

Does not make sense to have this in ethtool one bit to me.


>
>The above points in the last mail are more about how it influences the
>forwarding rules in the switch and my preference would be that it
>doesn't change how the forwarding works in the switch and instead the
>forwarding state is managed via standard tools 'tc', 'bridge', etc. So
>I think my comments are still relevant. However as long as when we
>query the nic/switch we get the correct information back in any mode
>I'm not too concerned I suspect any software that actually uses this
>will have to query and reconfigure either way counting on driver
>writers to get policy correct is not a stable way to write usermode
>software.
>
>All that said I don't plan to change the forwarding state this way
>with the intel drivers when implementing the vf representer.
>
>Yet another reason not to change the state of the forwarding rules is
>even on older hardware that only supports l2 mac/vlan based forwarding
>having a VF representer is useful to configure the device and send/recv
>some basic control packets (e.g. lldp). On these devices l2 mac/vlan
>mode is the only one supported.
>
>>>>> Otherwise I didn't review the mlx code but read the commit msgs and
>>>>> it looks good. I'll take a closer look in the morning.
>>>
>>> appreciated
>>>
>> 
>

  reply	other threads:[~2016-06-28 18:46 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-27 16:07 [PATCH net-next 00/16] Mellanox 100G SRIOV E-Switch offload and VF representors Saeed Mahameed
2016-06-27 16:07 ` [PATCH net-next 01/16] net/mlx5: E-Switch, Add operational mode to the SRIOV e-Switch Saeed Mahameed
2016-06-27 16:07 ` [PATCH net-next 02/16] net/mlx5: E-Switch, Add support for the sriov offloads mode Saeed Mahameed
2016-06-27 16:07 ` [PATCH net-next 03/16] net/mlx5: E-Switch, Add miss rule for " Saeed Mahameed
2016-06-27 16:53   ` Sergei Shtylyov
2016-06-27 20:40     ` Or Gerlitz
2016-06-27 16:07 ` [PATCH net-next 04/16] net/mlx5: E-Switch, Add API to create send-to-vport rules Saeed Mahameed
2016-06-27 16:07 ` [PATCH net-next 05/16] net/mlx5: Introduce offloads steering namespace Saeed Mahameed
2016-06-27 16:07 ` [PATCH net-next 06/16] net/mlx5: E-Switch, Add offloads table Saeed Mahameed
2016-06-27 16:07 ` [PATCH net-next 07/16] net/mlx5: E-Switch, Add API to create vport rx rules Saeed Mahameed
2016-06-27 16:07 ` [PATCH net-next 08/16] net/devlink: Add E-Switch mode control Saeed Mahameed
2016-06-28  5:57   ` John Fastabend
2016-06-28 10:25     ` Or Gerlitz
2016-06-28 16:19       ` John Fastabend
2016-06-28 17:19         ` John Fastabend
2016-06-28 18:46           ` Jiri Pirko [this message]
2016-06-28 19:04             ` Samudrala, Sridhar
2016-06-28 19:12               ` Jiri Pirko
2016-06-28 19:31                 ` John Fastabend
2016-06-29 14:48                   ` Or Gerlitz
2016-06-29 16:35                     ` John Fastabend
2016-06-29 21:33                       ` Or Gerlitz
2016-06-29 22:09                         ` John Fastabend
2016-06-30  3:35                           ` John Fastabend
2016-06-30  4:04                             ` John Fastabend
2016-06-30  6:25                               ` Jiri Pirko
2016-06-30  7:13                                 ` Samudrala, Sridhar
2016-06-30  7:41                                   ` Jiri Pirko
2016-06-30  7:57                                     ` John Fastabend
2016-06-30 10:52                                       ` Jiri Pirko
2016-06-30 14:24                                         ` Or Gerlitz
2016-06-30 15:40                                         ` John Fastabend
2016-06-30 15:53                                           ` Jiri Pirko
2016-06-30 16:29                                             ` John Fastabend
2016-06-29  9:44         ` Or Gerlitz
2016-06-28 12:27   ` Jiri Pirko
2016-06-27 16:07 ` [PATCH net-next 09/16] net/mlx5: Add devlink interface Saeed Mahameed
2016-06-27 16:07 ` [PATCH net-next 10/16] net/mlx5e: Add devlink based SRIOV mode changes (legacy --> offloads) Saeed Mahameed
2016-06-28 13:42   ` Andy Gospodarek
2016-06-28 14:25     ` Or Gerlitz
2016-06-28 14:49       ` Andy Gospodarek
2016-06-27 16:07 ` [PATCH net-next 11/16] net/mlx5e: Create NIC global resources only once Saeed Mahameed
2016-06-27 16:07 ` [PATCH net-next 12/16] net/mlx5e: TIRs management refactoring Saeed Mahameed
2016-06-27 16:07 ` [PATCH net-next 13/16] net/mlx5e: Mark enabled RQTs instances explicitly Saeed Mahameed
2016-06-27 16:07 ` [PATCH net-next 14/16] net/mlx5e: Add support for multiple profiles Saeed Mahameed
2016-06-27 16:07 ` [PATCH net-next 15/16] net/mlx5: Add Representors registration API Saeed Mahameed
2016-06-27 16:07 ` [PATCH net-next 16/16] net/mlx5e: Introduce SRIOV VF representors Saeed Mahameed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160628184655.GB2089@nanopsycho.orion \
    --to=jiri@resnulli.us \
    --cc=davem@davemloft.net \
    --cc=gospo@cumulusnetworks.com \
    --cc=hadarh@mellanox.com \
    --cc=idosch@mellanox.com \
    --cc=jesse.brandeburg@intel.com \
    --cc=jiri@mellanox.com \
    --cc=john.fastabend@gmail.com \
    --cc=john.r.fastabend@intel.com \
    --cc=netdev@vger.kernel.org \
    --cc=ogerlitz@mellanox.com \
    --cc=saeedm@mellanox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.