netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Yuval Avnery <yuvalav@mellanox.com>
To: Parav Pandit <parav@mellanox.com>,
	Andy Gospodarek <andrew.gospodarek@broadcom.com>
Cc: Jakub Kicinski <jakub.kicinski@netronome.com>,
	Jiri Pirko <jiri@mellanox.com>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	Saeed Mahameed <saeedm@mellanox.com>,
	"leon@kernel.org" <leon@kernel.org>,
	"davem@davemloft.net" <davem@davemloft.net>,
	"shuah@kernel.org" <shuah@kernel.org>,
	Daniel Jurgens <danielj@mellanox.com>,
	"michael.chan@broadcom.com" <michael.chan@broadcom.com>
Subject: RE: [PATCH net-next v2 00/10] devlink subdev
Date: Thu, 14 Nov 2019 16:47:43 +0000	[thread overview]
Message-ID: <AM6PR05MB51424FC005A5454E3638E77CC5710@AM6PR05MB5142.eurprd05.prod.outlook.com> (raw)
In-Reply-To: <AM0PR05MB4866553530461C933DA3181ED1760@AM0PR05MB4866.eurprd05.prod.outlook.com>

Andy, 
Jakub,

Thank you for your comments

I will send another version with the following enhancements:

- Support multiple hw_addr.
- Describe more future use cases.
- Describe mdevs creation with devlink subdev. (this can be used for VFs, once PCIe spec supports)


> -----Original Message-----
> From: Parav Pandit
> Sent: Tuesday, November 12, 2019 10:37 PM
> To: Andy Gospodarek <andrew.gospodarek@broadcom.com>; Yuval Avnery
> <yuvalav@mellanox.com>
> Cc: Jakub Kicinski <jakub.kicinski@netronome.com>; Jiri Pirko
> <jiri@mellanox.com>; netdev@vger.kernel.org; Saeed Mahameed
> <saeedm@mellanox.com>; leon@kernel.org; davem@davemloft.net;
> shuah@kernel.org; Daniel Jurgens <danielj@mellanox.com>;
> michael.chan@broadcom.com
> Subject: RE: [PATCH net-next v2 00/10] devlink subdev
> 
> Hi Andy,
> 
> > From: Andy Gospodarek <andrew.gospodarek@broadcom.com>
> > Sent: Tuesday, November 12, 2019 8:56 AM On Mon, Nov 11, 2019 at
> > 06:46:52PM +0000, Yuval Avnery wrote:
> > >
> > > >
> [..]
> 
> >
> > It feels a bit 'unconstrained' to me as well.  As Jakub said you added
> > some documentation, but the direction of this long-term is not clear.
> 
> I am interested to know if the long-term direction of the devlink was clear
> enough in cover letter [1], that devlink device will change net namespace, it
> will have health reporters, dump regions, eswitch VF and PF ports,
> representor binding, "enable_roce" extensions.
> Because that vision is not clear in the cover letter [1] at least to me.
> 
> Resource division was very clearly described in (4) in cover letter [1] which
> you were fan of from 2016 in [2].
> MSIX or in general interrupts resource splitting among VFs is no different
> than any other resource division.
> 
> devlink eswitch port for anchoring interrupt attribute is certainly abuse of an
> 'esw port' object.
> Defining a subdev object, even though today it holds just 2 attributes
> (hw_address, and msix interrupt), it still sane object hierarchy.
> Setting host side mac address is not really job of eswitch ports.
> It may be on the same ASIC, but it should be put at right object.
> 
> > What seems to happen too often is that we skip creating better infra
> > for existing devices and create it only for the newest shiniest object.
> This infrastructure is for existing devices (non smartnic devices too).
> May be worth clarifying explicitly in cover letter.
> 
> > I'm not sure if that is part of what bothers Jakub, but it is one
> > thing that bothers me and why this feels incomplete.
> Let's work towards making it complete as much as for given functionality.
> 
> >
> > The thing that has been bouncing around in my head about this (and
> > until I was in front of a good text-based MUA I didn't dare respond)
> > is that we seem to have an overlap in functionality between what you
> > are proposing and existing virtual device configuration, but you are
> > completely ignoring improving upon the existing creation methods.
> > Whether dealing with a SmartNIC (which I used to describe a NIC with
> > general purpose processors that could be running Linux and I think you
> > do, too) or a regular NIC it seems like we should use devlink to
> > create and configure a variety of devices these could be:
> >
> > 1.  Number of PFs (there are cases where more than 1 PF per uplink port
> >     is required for command/control of the SmartNIC or where a single
> >     PCI b/d/f may have many uplink ports that need to be addressed
> >     separately)
> How is this related to subdev object? Do you mean user should be able to
> create PF subdevice object for host?
> Currently decision to create subdev objects is done by the vendor driver.
> But it is not prevented in future to extend it, if it make sense.
> You need to describe the usecase and usual review process..
> 
> > 2.  Device-specific SR-IOV VFs visible to server 3.  mdev devices that
> > _may_ have representers on the embedded cores of
> >     the NIC
> There is no mdev NIC devices on embedded cores NIC driver exist in kernel
> today.
> I posted the series [3] to cover this part and we worked through kvm and
> netdev mailing list to get right, deterministic phys_port_name for it, right
> devlink port flavour etc necessary plumbing. This was done 2 months before
> posting [3].
> So mdev's subdev management is covered and cover letter also talks about it
> to manage it in uniform way as PF/VF/mdev.
> 
> > 4.  Hardware VirtIO-net devices supported
> How this is blocked by subdev object introduction?
> The way I envision from the recent discussion is, There will be mdev-virtio or
> virtio bus where hardware virto devices will be exposed.
> And therefore, their subdevices can be also controlled this way.
> But my understanding of unpublished mellanox virtio-net device and yours
> could be different. :-) So if you have link/ppt/rfc/patches I should read,
> please let me know, how you see hw virtio-net devices.
> 
> > 5.  Other non-network devices
> > (storage given as the first example) 6.  ...
> >
> I think its really beyond the scope of life cycling everything using this subdev
> object.
> It is not a ultimate solution to all use cases. :-)
> 
> > We cannot get rid of the methods for creating hardware VFs via sysfs,
> > but now that we are seeing lots of different flavors of devices that
> > might be created we should start by making sure at a minimum we can
> > create existing hardware devices (VFs) with this devlink interface and move
> from there.
> Well, I have been thinking about it for last 6-7 months as there is locking
> issues present between num_vfs sysfs (device lock!), ip link (no lock) and
> devlink (devlink lock).
> All unsyncronized with each other which leads to call traces; using some lock
> in multiple drivers will be a red bandage.
> It is not the scope of this series.
> 
> > Is there a plan
> > to use this interface to create subdevs that are VFs or just new subdev
> flavors?
> Creating individual VFs is supported by PCI spec when I checked last time.
> Do you know if that is supported now?
> Last year autoprobe file was added to pci core to dynamically bind the VFs
> and bug was fixed recently by Alex Williamson.
> 
> Let's say if that is added in future, even than subdev object creation on event
> of dynamic VF creation holds good. Isn't it?
> Whenever mdev series updated version gets accepted, it will dynamically
> create the subdev object anyway of mdev or some other flavour.
> 
> > I could start to get behind an interface like this if the patches
> > showed that devlink would be the one place where device creation and
> > control could be done for all types of subdevs, but that isn't clear
> > that it is your direction based on the patches.
> >
> > So just to make sure this is clear, what I'm proposing that devlink is
> > used to create and configure all types of devices that it may be
> > possible to create, configure, or address from a PF and the starting
> > place should be standard, hardware VFs.
> I do not agree with 'creating all devices' part that it should be the starting
> point given that other method already exists for VFs, mdev (though its buggy
> for VFs).
> Extending devlink to create mdev is nice, but not a must. Same goes for other
> flavours such as virtio/sf.
> I also proposed that to create them via devlink in [4], but its not must.
> This series for mdev [3] already fits in the existing devlink, eswitch, devlink
> port, devlink subdev object model.
> 
> > If that can be done right and we can address some of the issues with
> > the current implementation (more than one hw addr is a _great_
> > example) then adding new flavours for the server devices and and
> > adding new flavors for SmartNIC-based devices should be easy and feel
> natural.
> I also agree that setting more than one hw address should be supported,
> Jakub highlighted too of this need for Intel.
> Command should be similar to 'ip addr add', and not 'devlink set hw_addr'.
> Mlx5 doesn't currently supported so setting multiple hw_address should
> return right error code ENOSPEC or ENOSUPP by mlx5 driver.
> 
> Yuval, Can you please address this comment from Jakub and Andy to support
> multiple hw_address?
> 
> I just want to say that subdev is not so grand object.
> As use case, features, ASIC evolves right attributes should be placed in right
> object.
> (just like how devlink evolved from [1] to now.) Aren't we reviewing the
> patches? Why should it become a unconstrained object to put everything in
> it?
> This series doesn't ask to put everything in it.
> 
> As Yuval pointed, there are two/three near term plan after this series.
> 1. Grouping VFs subdevs so that host side min/max rates can be applied on
> the group (group of VFs) 2. Extending subdev flavour for mdev/virtio/sf
> depending on how series [3] take shape 3. msix configuration for VFs
> 
> [1] https://lwn.net/Articles/677967/
> [2] https://www.spinics.net/lists/netdev/msg365532.html
> [3] https://lore.kernel.org/linux-rdma/20191107160448.20962-1-
> parav@mellanox.com/
> [4] https://lore.kernel.org/linux-
> rdma/AM0PR05MB4866444210721BC4EE775D27D17B0@AM0PR05MB4866.e
> urprd05.prod.outlook.com/
> 


      reply	other threads:[~2019-11-14 16:47 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-08 16:18 [PATCH net-next v2 00/10] devlink subdev Yuval Avnery
2019-11-08 16:18 ` [PATCH net-next v2 01/10] devlink: Introduce subdev Yuval Avnery
2019-11-08 16:18 ` [PATCH net-next v2 02/10] devlink: Add PCI attributes support for subdev Yuval Avnery
2019-11-08 16:18 ` [PATCH net-next v2 03/10] devlink: Add port with subdev register support Yuval Avnery
2019-11-08 16:18 ` [PATCH net-next v2 04/10] devlink: Support subdev HW address get Yuval Avnery
2019-11-08 18:00   ` Parav Pandit
2019-11-09 17:45     ` Yuval Avnery
2019-11-09 18:13       ` Parav Pandit
2019-11-10  8:12         ` Jiri Pirko
2019-11-08 16:18 ` [PATCH net-next v2 05/10] devlink: Support subdev HW address set Yuval Avnery
2019-11-08 16:18 ` [PATCH net-next v2 06/10] Documentation: Add devlink-subdev documentation Yuval Avnery
2019-11-08 16:18 ` [PATCH net-next v2 07/10] netdevsim: Add max_vfs to bus_dev Yuval Avnery
2019-11-08 16:18 ` [PATCH net-next v2 08/10] netdevsim: Add devlink subdev creation Yuval Avnery
2019-11-08 16:18 ` [PATCH net-next v2 09/10] netdevsim: Add devlink subdev sefltest for netdevsim Yuval Avnery
2019-11-08 16:18 ` [PATCH net-next v2 10/10] net/mlx5e: Add support for devlink subdev and subdev hw_addr set/show Yuval Avnery
2019-11-11 18:00 ` [PATCH net-next v2 00/10] devlink subdev Jakub Kicinski
2019-11-11 18:46   ` Yuval Avnery
2019-11-12 14:55     ` Andy Gospodarek
2019-11-12 18:35       ` Yuval Avnery
2019-11-12 22:19         ` Jakub Kicinski
2019-11-12 23:32           ` Yuval Avnery
2019-11-13  6:37       ` Parav Pandit
2019-11-14 16:47         ` Yuval Avnery [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=AM6PR05MB51424FC005A5454E3638E77CC5710@AM6PR05MB5142.eurprd05.prod.outlook.com \
    --to=yuvalav@mellanox.com \
    --cc=andrew.gospodarek@broadcom.com \
    --cc=danielj@mellanox.com \
    --cc=davem@davemloft.net \
    --cc=jakub.kicinski@netronome.com \
    --cc=jiri@mellanox.com \
    --cc=leon@kernel.org \
    --cc=michael.chan@broadcom.com \
    --cc=netdev@vger.kernel.org \
    --cc=parav@mellanox.com \
    --cc=saeedm@mellanox.com \
    --cc=shuah@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).