linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] mlx5 ConnectX diagnostic misc driver
@ 2023-10-18  8:19 Saeed Mahameed
  2023-10-18  8:19 ` [PATCH 1/5] mlx5: Add aux dev for ctl interface Saeed Mahameed
                   ` (5 more replies)
  0 siblings, 6 replies; 36+ messages in thread
From: Saeed Mahameed @ 2023-10-18  8:19 UTC (permalink / raw)
  To: Arnd Bergmann, Greg Kroah-Hartman
  Cc: linux-kernel, Leon Romanovsky, Jason Gunthorpe, Jiri Pirko,
	Saeed Mahameed

From: Saeed Mahameed <saeedm@nvidia.com>

Hello Greg and Arnd,

The ConnectX HW family supported by the mlx5 drivers uses an architecture
where a FW component executes "mailbox RPCs" issued by the driver to make
changes to the device. This results in a complex debugging environment
where the FW component has information and complex low level state that
needs to be accessed to userspace for debugging purposes.

Historically a userspace program was used that accessed the PCI register
and config space directly through /sys/bus/pci/.../XXX and could operate
these debugging interfaces in parallel with the running driver.
This approach is incompatible with secure boot and kernel lockdown so this
driver provides a secure and restricted interface to that.

To solve this we add a misc driver "mlx5ctl" that would interface with
mlx5_core ConnectX driver to access the underlaying device debug
information. 

1) The first patch in the series introduces the main driver file with the
implementation of a new mlx5 auxiliary device driver to run on top
mlx5_core device instances, on probe it creates a new misc device and in
this patch we implement the open and release fops, On open the driver
would allocate a special FW UID (user context ID) restricted to debug
RPCs only, where all user debug RPCs will be executed under this UID,
and on release the UID will be freed.

2) The second patch adds an info ioctl that will show the allocated UID
and the available capability masks of the device and the current UID, and
some other useful device information such as the underlying ConnectX
device

Example:
    $ mlx5ctl mlx5_core.ctl.0
    mlx5dev: 0000:00:04.0
    UCTX UID: 1
    UCTX CAP: 0x3
    DEV UCTX CAP: 0x3
    USER CAP: 0x1d

3) Third patch adds RPC ioctl to execute debug RPCs under the
special UID.

In the mlx5 architecture the FW RPC commands are of the format of
inbox and outbox buffers. The inbox buffer contains the command
rpc layout as described in the ConnectX Programmers Reference Manual
(PRM) document and as defined in linux/include/mlx5/mlx5_ifc.h.

On success the user outbox buffer will be filled with the device's rpc
response.

For example to query device capabilities:
a user fills out an inbox buffer with the inbox layout:
    struct mlx5_ifc_query_hca_cap_in_bits
and expects an outbox buffer with the layout:
     struct mlx5_ifc_cmd_hca_cap_bits

4) The fourth patch adds the ability to register user memory into the
ConntectX device and create a umem object that points to that memory.

Command rpc outbox buffer is limited in size, which can be very
annoying when trying to pull large traces out of the device.
Many rpcs offer the ability to scatter output traces, contexts
and logs directly into user space buffers in a single shot.

The registered/pinned memory will be described by a device UMEM object
which has a unique umem_id, this umem_id can be later used in the rpc
inbox to tell the device where to populate the response output,
e.g HW traces and other debug object queries.

Example usecase, a ConnectX device coredump can be as large as 2MB.
Using inline rpcs will take thousands of rpcs to get the full
coredump which can consume multiple seconds.

With UMEM, it can be done in a single rpc, using 2MB of umem user buffer.

Other usecases with umem:
  - dynamic HW and FW trace monitoring
  - high frequency diagnostic counters sampling
  - batched objects and resource dumps

See links below for information about user-space tools that use this
interface:

[1] https://github.com/saeedtx/mlx5ctl

[2] https://github.com/Mellanox/mstflint
see:
    d) mstregdump utility
      This utility dumps hardware registers from Mellanox hardware
      for later analysis by Mellanox.

    g) mstconfig
      This tool sets or queries non-volatile configurable options

    i) mstreg
      The mlxreg utility allows users to obtain information
      regarding supported access registers, such as their fields

Saeed Mahameed (5):
  mlx5: Add aux dev for ctl interface
  misc: mlx5ctl: Add mlx5ctl misc driver
  misc: mlx5ctl: Add info ioctl
  misc: mlx5ctl: Add command rpc ioctl
  misc: mlx5ctl: Add umem reg/unreg ioctl

 .../userspace-api/ioctl/ioctl-number.rst      |   1 +
 drivers/misc/Kconfig                          |   1 +
 drivers/misc/Makefile                         |   1 +
 drivers/misc/mlx5ctl/Kconfig                  |  14 +
 drivers/misc/mlx5ctl/Makefile                 |   5 +
 drivers/misc/mlx5ctl/main.c                   | 528 ++++++++++++++++++
 drivers/misc/mlx5ctl/umem.c                   | 325 +++++++++++
 drivers/misc/mlx5ctl/umem.h                   |  17 +
 drivers/net/ethernet/mellanox/mlx5/core/dev.c |   8 +
 include/uapi/misc/mlx5ctl.h                   |  51 ++
 10 files changed, 951 insertions(+)
 create mode 100644 drivers/misc/mlx5ctl/Kconfig
 create mode 100644 drivers/misc/mlx5ctl/Makefile
 create mode 100644 drivers/misc/mlx5ctl/main.c
 create mode 100644 drivers/misc/mlx5ctl/umem.c
 create mode 100644 drivers/misc/mlx5ctl/umem.h
 create mode 100644 include/uapi/misc/mlx5ctl.h

-- 
2.41.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2023-11-19  9:49 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-18  8:19 [PATCH 0/5] mlx5 ConnectX diagnostic misc driver Saeed Mahameed
2023-10-18  8:19 ` [PATCH 1/5] mlx5: Add aux dev for ctl interface Saeed Mahameed
2023-10-18  8:19 ` [PATCH 2/5] misc: mlx5ctl: Add mlx5ctl misc driver Saeed Mahameed
2023-10-18  8:30   ` Greg Kroah-Hartman
2023-10-18  8:49     ` Leon Romanovsky
2023-10-18  8:55       ` Greg Kroah-Hartman
2023-10-18 10:00         ` Leon Romanovsky
2023-10-18 11:52           ` Greg Kroah-Hartman
2023-10-18 18:01     ` Jason Gunthorpe
2023-10-18 18:22       ` Greg Kroah-Hartman
2023-10-18 18:56         ` Jason Gunthorpe
2023-10-19 17:21           ` Greg Kroah-Hartman
2023-10-19 19:00             ` Jason Gunthorpe
2023-10-19 19:46               ` Greg Kroah-Hartman
2023-10-19 23:49                 ` Jason Gunthorpe
2023-10-20 20:17                   ` Greg Kroah-Hartman
2023-10-19 21:50             ` Dual licensing [was: [PATCH 2/5] misc: mlx5ctl: Add mlx5ctl misc driver] Jonathan Corbet
2023-10-20 19:30               ` Dave Airlie
2023-10-20 20:07               ` Greg Kroah-Hartman
2023-10-18  8:30   ` [PATCH 2/5] misc: mlx5ctl: Add mlx5ctl misc driver Greg Kroah-Hartman
2023-10-18  8:19 ` [PATCH 3/5] misc: mlx5ctl: Add info ioctl Saeed Mahameed
2023-10-18  9:02   ` Arnd Bergmann
2023-10-18 10:08     ` Leon Romanovsky
2023-10-18 11:02       ` Arnd Bergmann
2023-10-22  1:46   ` kernel test robot
2023-10-22 11:27   ` kernel test robot
2023-10-18  8:19 ` [PATCH 4/5] misc: mlx5ctl: Add command rpc ioctl Saeed Mahameed
2023-10-18  8:19 ` [PATCH 5/5] misc: mlx5ctl: Add umem reg/unreg ioctl Saeed Mahameed
2023-10-18  8:33   ` Greg Kroah-Hartman
2023-11-19  9:49     ` Saeed Mahameed
2023-10-18  9:30   ` Arnd Bergmann
2023-10-18 11:51     ` Jason Gunthorpe
2023-11-19  9:44     ` Saeed Mahameed
2023-10-18  8:31 ` [PATCH 0/5] mlx5 ConnectX diagnostic misc driver Greg Kroah-Hartman
2023-10-18 12:00   ` Jason Gunthorpe
2023-10-18 12:11     ` Greg Kroah-Hartman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).