All of lore.kernel.org
 help / color / mirror / Atom feed
From: John Fastabend <john.fastabend@gmail.com>
To: Or Gerlitz <ogerlitz@mellanox.com>, Jiri Pirko <jiri@resnulli.us>
Cc: "Samudrala, Sridhar" <sridhar.samudrala@intel.com>,
	Saeed Mahameed <saeedm@mellanox.com>,
	"David S. Miller" <davem@davemloft.net>,
	netdev@vger.kernel.org, Hadar Hen-Zion <hadarh@mellanox.com>,
	Jiri Pirko <jiri@mellanox.com>,
	Andy Gospodarek <gospo@cumulusnetworks.com>,
	Jesse Brandeburg <jesse.brandeburg@intel.com>,
	John Fastabend <john.r.fastabend@intel.com>,
	Ido Schimmel <idosch@mellanox.com>,
	Tal Anker <Ankertal@mellanox.com>
Subject: Re: [PATCH net-next 08/16] net/devlink: Add E-Switch mode control
Date: Wed, 29 Jun 2016 09:35:24 -0700	[thread overview]
Message-ID: <5773F8CC.4090300@gmail.com> (raw)
In-Reply-To: <0327b6de-dfdf-8db2-1354-b7a4144dbc02@mellanox.com>

On 16-06-29 07:48 AM, Or Gerlitz wrote:
> On 6/28/2016 10:31 PM, John Fastabend wrote:
>> On 16-06-28 12:12 PM, Jiri Pirko wrote:
>>>
>>> Why?! Please, leave legacy be legacy. Use the new mode for
>>> implementing new features. Don't make things any more complicated :(
>>>
>> OK so how I read this is there are two things going on that are being
>> conflated together. Creating VF netdev's is linked to the PCIe
>> subsystems and brings VFs into the netdev model. This is a good thing
>> but doesn't need to be a global nic policy it can be per port hence
>> the ethtool flag vs devlink discussion. I don't actually have a use case
>> to have one port with VF netdevs and another without it so I'm not too
>> particular on this. Logically it looks like a per port setting because
>> the hardware has no issues with making one physical function create
>> a netdev for each of its VFs and the other one run without these
>> netdevs. This is why I called it out.
>>
>> How this relates to bridge, tc, etc. is now you have a identifier to
>> configure instead of using strange 'ip link set ... vf#' commands. This
>> is great. But I see no reason the hardware has to make changes to
>> the existing tables or any of this. Before we used 'bridge fdb' and 'ip
>> link' now we can use bridge tools more effectively and can deprecate
>> the overloaded use of ip. But again I see no reason to thrash the
>> forwarding state of the switch because we happen to be adding VFs.
>> Having a set of fdb rules to forward MAC/Vlan pairs (as we do now)
>> seems like a perfectly reasonable default. Add with this patch now
>> when I run 'fdb show' I can see the defaults.
>>
>> Maybe I'm reading to much into the devlink flag names and if instead
>> you use a switch like the following,
>>
>>    VF representer : enable/disable the creation VF netdev's to represent
>>                     the virtual functions on the PF
>>
>>
>> Much less complicated then magic switching between forwarding logic IMO
>> and you don't whack a default configuration that an entire stack (e.g.
>> libvirt) has been built to use.
> 
> 
> John,
> 
> I'll try to address here the core questions and arguments you brought.
> 

thanks. Also just to reiterate I really like the series just a few
details here.

> Re letting the user to observe/modify the rules added by the
> driver/firmware while legacy mode. Even if possible with bridge/fdb, it
> will be really pragmatical and doesn't make sense to get that donefor
> the TC subsystem. So this isn't a well defined solution and anyway, as
> you said, legacy mode enhancements is a different exercise. Personally,
> I agree with Jiri, that we should legacy be legacyand focus on adding
> the new model.
> 

The ixgbe driver already supports bridge and tc commands without the VF
representer.  Adding the VF representer to these drivers just extends
the existing support so we have an identifier for VFs and now the
redirect action works and the fdb commands can specify the VF netdevs.
I don't see this as a problem because we already do it today with
'ip' and bridge tools.

We are also slightly in disagreement about what the default should be
with VF netdevs. I think the default should be the same L2 mac/vlan
switch behavior and see no reason to change it by default just because
we added VF netdevs. The infrastructure libvirt/openstack/etc are built
around this default today. But I guess nothing in this series specifies
what the defaults of any given driver will be. VF netdevs are still
useful even on older hardware that only supports mac/vlan forwarding to
expose statistics and send/receive control frames such as lldp.

> The new model has few building blocks, and by all means, have the VF
> representors is not the full story, which is not magic but rather the
> following:
> 
> 1. VF (vport) representors netdevices + the needed mechanics
> (send-to-vport rules that makes xmit on VF rep --> recv on VF)
> 

We all agree on this. For me this should be its own knob VF netdevs or
no VF netdevs.

There is also my point that this is really a port attribute of the PCIe
configuration not a switch attribute.

> 2. handling HW data-patch misses --> send to CPU or drop

Yep need this also but we have a standard way to configure this already
with bridge and 'tc' so why have a toggle for it? Also you don't know
in the driver where I want to send missed packets. In some use cases I
have the VM is managing the system and in these cases I want to send
missed packets to a VF.

In ixgbe we get this for free (with the vf identifier netdevs) because
we have 'tc' and 'bridge' already hooked up. With 'tc' you can define
a wild card match with low priority and with 'bridge' model you can
setup the flood ports to do this.

> 
> 3. ability to offload SW rules (tc/bridge/etc) using VF representors and
> ingress qdiscs / bridge fdb rules / switchdev fdb rule, etc
> 
> The knob we suggested says that the system is put into a state where
> 1,2,3 are needed to make it full performance and functional one. This
> submission includes parts 1 and 2, so the offloading of SW rules will
> done in successive submission which uses TC offloads which are already
> upstream (u32 or flower).
> 
> So... we're almost in agreement, do you have another name for the knob
> that goes beyond creation/deletion of VF reps? maybe that would be it
> for making a progress...

The sticking point for me is (2) is not needed if you do (3)
correctly. So once you have implemented bridge and one of the 'tc'
classifiers that can be used to specify the policy in (2) and you don't
have a chunk policy being defined by the driver writer.

Just to put out an alternative if you add an ethtool feature flag 'VF
representer' so that I can specify enable/disable of VFs per port that
would resolve my concerns.

If you have this additional switch in devlink to hammer the datapath
between two switch modes that seems OK but I'm not sure who else other
than mlx drivers would use it. Additionally if you just used this
devlink hook to set the feature flag on each port and made it 'fixed'
from an ethtool perspective that would work for me as well. Then on
my devices that support VF representers per port I can configure it
and on the devices that can only do it globally it is configured with
this devlink thing.

Why I think the VF representer is a per port ethtool flag and not a
devlink option is my use case might be to assign a PF into a VM or
namespace where I don't want VF netdevs.

Thanks,
.John


> 
> Or.
> 

  reply	other threads:[~2016-06-29 16:42 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-27 16:07 [PATCH net-next 00/16] Mellanox 100G SRIOV E-Switch offload and VF representors Saeed Mahameed
2016-06-27 16:07 ` [PATCH net-next 01/16] net/mlx5: E-Switch, Add operational mode to the SRIOV e-Switch Saeed Mahameed
2016-06-27 16:07 ` [PATCH net-next 02/16] net/mlx5: E-Switch, Add support for the sriov offloads mode Saeed Mahameed
2016-06-27 16:07 ` [PATCH net-next 03/16] net/mlx5: E-Switch, Add miss rule for " Saeed Mahameed
2016-06-27 16:53   ` Sergei Shtylyov
2016-06-27 20:40     ` Or Gerlitz
2016-06-27 16:07 ` [PATCH net-next 04/16] net/mlx5: E-Switch, Add API to create send-to-vport rules Saeed Mahameed
2016-06-27 16:07 ` [PATCH net-next 05/16] net/mlx5: Introduce offloads steering namespace Saeed Mahameed
2016-06-27 16:07 ` [PATCH net-next 06/16] net/mlx5: E-Switch, Add offloads table Saeed Mahameed
2016-06-27 16:07 ` [PATCH net-next 07/16] net/mlx5: E-Switch, Add API to create vport rx rules Saeed Mahameed
2016-06-27 16:07 ` [PATCH net-next 08/16] net/devlink: Add E-Switch mode control Saeed Mahameed
2016-06-28  5:57   ` John Fastabend
2016-06-28 10:25     ` Or Gerlitz
2016-06-28 16:19       ` John Fastabend
2016-06-28 17:19         ` John Fastabend
2016-06-28 18:46           ` Jiri Pirko
2016-06-28 19:04             ` Samudrala, Sridhar
2016-06-28 19:12               ` Jiri Pirko
2016-06-28 19:31                 ` John Fastabend
2016-06-29 14:48                   ` Or Gerlitz
2016-06-29 16:35                     ` John Fastabend [this message]
2016-06-29 21:33                       ` Or Gerlitz
2016-06-29 22:09                         ` John Fastabend
2016-06-30  3:35                           ` John Fastabend
2016-06-30  4:04                             ` John Fastabend
2016-06-30  6:25                               ` Jiri Pirko
2016-06-30  7:13                                 ` Samudrala, Sridhar
2016-06-30  7:41                                   ` Jiri Pirko
2016-06-30  7:57                                     ` John Fastabend
2016-06-30 10:52                                       ` Jiri Pirko
2016-06-30 14:24                                         ` Or Gerlitz
2016-06-30 15:40                                         ` John Fastabend
2016-06-30 15:53                                           ` Jiri Pirko
2016-06-30 16:29                                             ` John Fastabend
2016-06-29  9:44         ` Or Gerlitz
2016-06-28 12:27   ` Jiri Pirko
2016-06-27 16:07 ` [PATCH net-next 09/16] net/mlx5: Add devlink interface Saeed Mahameed
2016-06-27 16:07 ` [PATCH net-next 10/16] net/mlx5e: Add devlink based SRIOV mode changes (legacy --> offloads) Saeed Mahameed
2016-06-28 13:42   ` Andy Gospodarek
2016-06-28 14:25     ` Or Gerlitz
2016-06-28 14:49       ` Andy Gospodarek
2016-06-27 16:07 ` [PATCH net-next 11/16] net/mlx5e: Create NIC global resources only once Saeed Mahameed
2016-06-27 16:07 ` [PATCH net-next 12/16] net/mlx5e: TIRs management refactoring Saeed Mahameed
2016-06-27 16:07 ` [PATCH net-next 13/16] net/mlx5e: Mark enabled RQTs instances explicitly Saeed Mahameed
2016-06-27 16:07 ` [PATCH net-next 14/16] net/mlx5e: Add support for multiple profiles Saeed Mahameed
2016-06-27 16:07 ` [PATCH net-next 15/16] net/mlx5: Add Representors registration API Saeed Mahameed
2016-06-27 16:07 ` [PATCH net-next 16/16] net/mlx5e: Introduce SRIOV VF representors Saeed Mahameed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5773F8CC.4090300@gmail.com \
    --to=john.fastabend@gmail.com \
    --cc=Ankertal@mellanox.com \
    --cc=davem@davemloft.net \
    --cc=gospo@cumulusnetworks.com \
    --cc=hadarh@mellanox.com \
    --cc=idosch@mellanox.com \
    --cc=jesse.brandeburg@intel.com \
    --cc=jiri@mellanox.com \
    --cc=jiri@resnulli.us \
    --cc=john.r.fastabend@intel.com \
    --cc=netdev@vger.kernel.org \
    --cc=ogerlitz@mellanox.com \
    --cc=saeedm@mellanox.com \
    --cc=sridhar.samudrala@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.