All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC net-next 0/8] Introducing subdev bus and devlink extension
@ 2019-03-01  5:37 Parav Pandit
  2019-03-01  5:37 ` [RFC net-next 1/8] subdev: Introducing subdev bus Parav Pandit
                   ` (8 more replies)
  0 siblings, 9 replies; 72+ messages in thread
From: Parav Pandit @ 2019-03-01  5:37 UTC (permalink / raw)
  To: netdev, linux-kernel, michal.lkml, davem, gregkh, jiri; +Cc: parav

Use case:
---------
A user wants to create/delete hardware linked sub devices without
using SR-IOV.
These devices for a pci device can be netdev (optional rdma device)
or other devices. Such sub devices share some of the PCI device
resources and also have their own dedicated resources.

Few examples are:
1. netdev having its own txq(s), rq(s) and/or hw offload parameters.
2. netdev with switchdev mode using netdev representor
3. rdma device with IB link layer and IPoIB netdev
4. rdma/RoCE device and a netdev
5. rdma device with multiple ports

Requirements for above use cases:
--------------------------------
1. We need a generic user interface & core APIs to create sub devices
from a parent pci device but should be generic enough for other parent
devices
2. Interface should be vendor agnostic
3. User should be able to set device params at creation time
4. In future if needed, tool should be able to create passthrough
device to map to a virtual machine
5. A device can have multiple ports
6. An orchestration software wants to know how many such sub devices
can be created from a parent device so that it can manage them in global
cluster resources.

So how is it done?
------------------
(a) user in control
To address above requirements, a generic tool iproute2/devlink is
extended for sub device's life cycle.
However a devlink tool and its kernel counter part is not sufficient
to create protocol agnostic devices on a existing PCI bus.

(b) subdev bus
A given bus defines well defined addressing scheme. Creating sub devices
on existing PCI bus with a different naming scheme is just weird.
So, creating well named devices on appropriate bus is desired.

Hence a new 'subdev' bus is created.
User adds/removes new sub devices subdev on this bus via a devlink tool.
devlink tool instructs hardware driver to create/remove/configure
such devices. Hardware vendor driver places devices on the bus.
Another or same vendor driver matches based on vendor-id, device-id
scheme and run through classic device driver model.

Given that, these are user created devices for a given hardware and in
absence of a central entity like PCISIG to assign vendor and device ids,
A unique vendor and device id are maintained as enum in
include/linux/subdev_ids.h.

subdev bus device names follow default device naming scheme of Linux
kernel. It is done as 'subdev<instance_id>' such as, subdev0, subdev3.

subdev device inherits its parent's DMA parameters.
subdev will follow rich power management infrastructure of core kernel/
So that every vendor driver doesn't have to iterate over its child
devices, invent a locking and device anchoring scheme.

Patchset summary:
-----------------
Patch-1, 2 introduces a subdev bus and interface for subdev life cycle.
Patch-3 extends modpost tool for module device id table.
Patch-4,5,6 implements a devlink vendor driver to add/remove devices.
Patch-7 mlx5 driver implements subdev devices and places them on subdev
bus. 
Patch-8 match against the subdev for mlx5 vendor, device id and creates
fake netdevice.

All patches are only a reference implementation to see RFC in works
at devlink, sysfs and device model level. Once RFC looks good, more
solid upstreamable version of the implementation will be done.
All patches are functional except the last two patches, which just
create fake subdev devices and fake netdevice.

System example view:
--------------------

$ devlink dev show
pci/0000:05:00.0

$ devlink dev add pci/0000:05:00.0
$ devlink dev show
pci/0000:05:00.0
subdev/subdev0

sysfs view with subdev:

$ ls -l /sys/bus/pci/devices/0000:05:00.0
[..]
drwxr-xr-x 3 root root        0 Feb 13 15:57 infiniband
-rw-r--r-- 1 root root     4096 Feb 13 15:57 msi_bus
drwxr-xr-x 3 root root        0 Feb 13 15:57 net
drwxr-xr-x 2 root root        0 Feb 13 15:57 power
drwxr-xr-x 3 root root        0 Feb 13 15:57 ptp
drwxr-xr-x 4 root root        0 Feb 13 15:57 subdev0

$ ls -l /sys/bus/pci/devices/0000:05:00.0/subdev0
lrwxrwxrwx 1 root root    0 Feb 13 15:58 driver -> ../../../../../bus/subdev/drivers/mlx5_core
drwxr-xr-x 3 root root    0 Feb 13 15:58 net
drwxr-xr-x 2 root root    0 Feb 13 15:58 power
lrwxrwxrwx 1 root root    0 Feb 13 15:58 subsystem -> ../../../../../bus/subdev
-rw-r--r-- 1 root root 4096 Feb 13 15:58 uevent

$ ls -l /sys/bus/pci/devices/0000:05:00.0/subdev0/net/
drwxr-xr-x 5 root root 0 Feb 13 15:58 eth0

Software view:
-------------
Some of you if you prefer to see in picture, below diagram tries to
show software modules in bus/device hierarchy.

devlink user (iproute2/devlink)
------------------------------
         |
         |
  +----------------+
  | devlink module |
  |         doit() |     +------------------+
  |            |   |     |   vendor driver  |
  +------------|---+     |   (mlx5)         |
               ----------+-> subdev_ops()   |
                         +|-----------------+
                          |
                +---------|--+  +-----------+  +------------------+
                | subdev bus |  | core      |  | subdev device    |
                | driver     |  | kernel    |  | drivers          |
                | (add/del)  |  | dev model |  | (netdev, rdma)   |
                |       ----------------------> probe/remove()    |
                +------------+  +-----------+  +------------------+

Alternatives considered:
------------------------
Will discuss separately if needed to keep this RFC short.


Parav Pandit (8):
  subdev: Introducing subdev bus
  subdev: Introduce pm callbacks
  modpost: Add support for subdev device id table
  devlink: Introduce and use devlink_init/cleanup() in alloc/free
  devlink: Add variant of devlink_register/unregister
  devlink: Add support for devlink subdev lifecycle
  net/mlx5: Add devlink subdev life cycle command support
  net/mlx5: Add subdev driver to bind to subdev devices

 drivers/Kconfig                                    |   2 +
 drivers/Makefile                                   |   1 +
 drivers/net/ethernet/mellanox/mlx5/core/Makefile   |   1 +
 drivers/net/ethernet/mellanox/mlx5/core/main.c     |  12 +-
 .../net/ethernet/mellanox/mlx5/core/mlx5_core.h    |   7 +
 drivers/net/ethernet/mellanox/mlx5/core/subdev.c   |  55 ++++++
 .../ethernet/mellanox/mlx5/core/subdev_driver.c    |  93 +++++++++
 drivers/subdev/Kconfig                             |  12 ++
 drivers/subdev/Makefile                            |   8 +
 drivers/subdev/subdev_main.c                       | 212 +++++++++++++++++++++
 include/linux/mod_devicetable.h                    |  12 ++
 include/linux/subdev_bus.h                         |  63 ++++++
 include/linux/subdev_ids.h                         |  17 ++
 include/net/devlink.h                              |  29 ++-
 include/uapi/linux/devlink.h                       |   3 +
 net/core/devlink.c                                 | 179 +++++++++++++++--
 scripts/mod/devicetable-offsets.c                  |   4 +
 scripts/mod/file2alias.c                           |  15 ++
 18 files changed, 704 insertions(+), 21 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/subdev.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/subdev_driver.c
 create mode 100644 drivers/subdev/Kconfig
 create mode 100644 drivers/subdev/Makefile
 create mode 100644 drivers/subdev/subdev_main.c
 create mode 100644 include/linux/subdev_bus.h
 create mode 100644 include/linux/subdev_ids.h

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 72+ messages in thread

end of thread, other threads:[~2021-06-10  7:17 UTC | newest]

Thread overview: 72+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-01  5:37 [RFC net-next 0/8] Introducing subdev bus and devlink extension Parav Pandit
2019-03-01  5:37 ` [RFC net-next 1/8] subdev: Introducing subdev bus Parav Pandit
2019-03-01  7:17   ` Greg KH
2019-03-01 16:35     ` Parav Pandit
2019-03-01 17:00       ` Greg KH
2019-03-26 11:48     ` Lorenzo Pieralisi
2019-03-01  5:37 ` [RFC net-next 2/8] subdev: Introduce pm callbacks Parav Pandit
2019-03-01  5:37 ` [RFC net-next 3/8] modpost: Add support for subdev device id table Parav Pandit
2019-03-01  5:37 ` [RFC net-next 4/8] devlink: Introduce and use devlink_init/cleanup() in alloc/free Parav Pandit
2019-03-01  5:37 ` [RFC net-next 5/8] devlink: Add variant of devlink_register/unregister Parav Pandit
2019-03-01  5:37 ` [RFC net-next 6/8] devlink: Add support for devlink subdev lifecycle Parav Pandit
2019-03-01  5:37 ` [RFC net-next 7/8] net/mlx5: Add devlink subdev life cycle command support Parav Pandit
2019-03-01  7:18   ` Greg KH
2019-03-01 16:04     ` Parav Pandit
2019-03-01  5:37 ` [RFC net-next 8/8] net/mlx5: Add subdev driver to bind to subdev devices Parav Pandit
2019-03-01  7:21   ` Greg KH
2019-03-01 17:21     ` Parav Pandit
2019-03-05  7:13       ` Greg KH
2019-03-05 17:57         ` Parav Pandit
2019-03-05 19:27           ` Greg KH
2019-03-05 21:37             ` Parav Pandit
2019-03-01 22:12   ` Saeed Mahameed
2019-03-04 16:45     ` Parav Pandit
2019-03-01 20:03 ` [RFC net-next 0/8] Introducing subdev bus and devlink extension Jakub Kicinski
2019-03-04  4:41   ` Parav Pandit
2019-03-05  1:35     ` Jakub Kicinski
2019-03-05 19:46       ` Parav Pandit
2019-03-05 22:39         ` Kirti Wankhede
2019-03-05 23:17           ` Parav Pandit
2019-03-05 23:44             ` Parav Pandit
2019-03-06  0:44               ` Parav Pandit
2019-03-06  3:51                 ` Kirti Wankhede
2019-03-06  5:42                   ` Parav Pandit
2019-03-07 19:04                     ` Kirti Wankhede
2019-03-07 20:27                       ` Parav Pandit
2019-03-07 20:53                         ` Kirti Wankhede
2019-03-07 21:02                           ` Parav Pandit
2019-03-07 21:07                             ` Kirti Wankhede
2019-03-07 21:21                               ` Parav Pandit
2019-03-07 22:01                                 ` Kirti Wankhede
2019-03-07 22:31                                   ` Parav Pandit
2019-03-08 12:19                                     ` Kirti Wankhede
2019-03-08 17:09                                       ` Parav Pandit
2019-03-05  1:45     ` Jakub Kicinski
2019-03-05 16:52       ` Parav Pandit
2021-05-31 10:36         ` moyufeng
2021-06-01  5:37           ` Jakub Kicinski
2021-06-01  7:33             ` Yunsheng Lin
2021-06-01 21:34               ` Jakub Kicinski
2021-06-02  2:24                 ` Yunsheng Lin
2021-06-02 16:34                   ` Jakub Kicinski
2021-06-03  3:46                     ` Yunsheng Lin
2021-06-03 17:53                       ` Jakub Kicinski
2021-06-04  1:18                         ` Yunsheng Lin
2021-06-04 18:41                           ` Jakub Kicinski
2021-06-07  1:36                             ` Yunsheng Lin
2021-06-07 19:46                               ` Jakub Kicinski
2021-06-08 12:10                                 ` Yunsheng Lin
2021-06-08 17:29                                   ` Jakub Kicinski
2021-06-09  9:16                                     ` Yunsheng Lin
2021-06-09  9:38                                       ` Parav Pandit
2021-06-09 11:05                                         ` Yunsheng Lin
2021-06-09 11:59                                           ` Parav Pandit
2021-06-09 12:30                                             ` Yunsheng Lin
2021-06-09 13:45                                               ` Parav Pandit
2021-06-10  7:04                                                 ` Yunsheng Lin
2021-06-10  7:17                                                   ` Parav Pandit
2021-06-09 16:40                                       ` Jakub Kicinski
2021-06-10  6:52                                         ` Yunsheng Lin
2021-06-09  9:52                                   ` Parav Pandit
2021-06-09 11:16                                     ` Yunsheng Lin
2021-06-09 12:00                                       ` Parav Pandit

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.