netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [pull request][net-next 00/15] Mellanox, mlx5 Firmware devlink health and sw reset
@ 2019-05-05  0:32 Saeed Mahameed
  2019-05-05  0:32 ` [net-next 01/15] net/mlx5: Move all devlink related functions calls to devlink.c Saeed Mahameed
                   ` (14 more replies)
  0 siblings, 15 replies; 34+ messages in thread
From: Saeed Mahameed @ 2019-05-05  0:32 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Jiri Pirko, Saeed Mahameed

Hi Dave,

This series provides the support for mlx5 Firmware devlink health and
sw reset.

We plan to follow up this series with a patch that provides mlx5
documentation under Documentation/networking/mlx5.rst, first thing in
5.3 kernel release, it will include all new mlx5 devlink options and
more.

For more information please see tag log below.

Please pull and let me know if there is any problem.

Thanks,
Saeed.

---
The following changes since commit a734d1f4c2fc962ef4daa179e216df84a8ec5f84:

  net: openvswitch: return an error instead of doing BUG_ON() (2019-05-04 01:36:36 -0400)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git tags/mlx5-updates-2019-05-04

for you to fetch changes up to 30d8b932dcebbcb8c5d1991cab5325c2e3faad6d:

  net/mlx5: Report devlink health on FW fatal issues (2019-05-04 17:22:45 -0700)

----------------------------------------------------------------
mlx5-updates-2019-05-04

Mlx5 devlink health fw reporters and sw reset support

This series provides mlx5 firmware reset support and firmware devlink health
reporters.

1) Add CR-Space access and FW Crdump snapshot support via devlink region_snapshot

2) Issue software reset upon FW asserts

3) Add fw and fw_fatal devlink heath reporters to follow fw errors indication by
dump and recover procedures and enable trigger these functionality by user.

3.1) fw reporter:
The fw reporter implements diagnose and dump callbacks.
It follows symptoms of fw error such as fw syndrome by triggering
fw core dump and storing it and any other fw trace into the dump buffer.
The fw reporter diagnose command can be triggered any time by the user to check
current fw status.

3.2) fw_fatal repoter:
The fw_fatal reporter implements dump and recover callbacks.
It follows fatal errors indications by CR-space dump and recover flow.
The CR-space dump uses vsc interface which is valid even if the FW command
interface is not functional, which is the case in most FW fatal errors. The
CR-space dump is stored as a memory region snapshot to ease read by address.
The recover function runs recover flow which reloads the driver and triggers fw
reset if needed.

Command examples and output:
diagnose data:
assert_var[0] 0xfc3fc043
assert_var[1] 0x0001b41c
assert_var[2] 0x00000000
assert_var[3] 0x00000000
assert_var[4] 0x00000000
assert_exit_ptr 0x008033b4
assert_callra 0x0080365c
fw_ver 16.24.1000
hw_id 0x0000020d
irisc_index 0
synd 0x8: unrecoverable hardware error
ext_synd 0x003d
raw fw_ver 0x101803e8

dump traces:
   trace: 0000:82:00.1 [0x69cd6c5283e] 0 [0xb8] dump general info GVMI=0x0001
   trace: 0000:82:00.1 [0x69cd6c53bec] 0 [0xb8] GVMI management info, gvmi_management context:
   trace: 0000:82:00.1 [0x69cd6c55eff] 0 [0xb8] [000]:  00000000  00000000  00000000  00000000
   trace: 0000:82:00.1 [0x69cd6c5657f] 0 [0xb8] [010]:  00000000  00000000  00000000  00000000
   trace: 0000:82:00.1 [0x69cd6c56608] 0 [0xb8] [020]:  00000000  00000000  00000000  00000000
   trace: 0000:82:00.1 [0x69cd6c566ff] 0 [0xb8] [030]:  00000000  00000000  00000000  00000000
   trace: 0000:82:00.1 [0x69cd6c5677f] 0 [0xb8] [040]:  00000000  00000000  00000000  00000000
   trace: 0000:82:00.1 [0x69cd6c5687f] 0 [0xb8] [050]:  00000000  00000000  00000000  00000000
   trace: 0000:82:00.1 [0x69cd6c568ff] 0 [0xb8] [060]:  00000000  00000000  00000000  00000000
   trace: 0000:82:00.1 [0x69cd6c569a5] 0 [0xb8] [070]:  00000000  00000000  00000000  00000000
   trace: 0000:82:00.1 [0x69cd6c57021] 0 [0xb8] CMDIF dbase from IRON: active_dbase_slots = 0x00000000
   trace: 0000:82:00.1 [0x69cd6c58dae] 0 [0xb8] GVMI=0x0001 hw_toc context:
   trace: 0000:82:00.1 [0x69cd6c58e7f] 0 [0xb8] [000]:  00400100  00000000  00000000  fffff000
   trace: 0000:82:00.1 [0x69cd6c58f7f] 0 [0xb8] [010]:  00000000  00000000  00000000  00000000
...
...

devlink_region_name: cr-space snapshot_id: 1

00000000000f0018 e1 03 00 00 fb ae a9 3f

0000000000000000 00 20 00 01 00 00 00 00 03 00 00 00 00 00 00 00
0000000000000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0000000000000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0000000000000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0000000000000040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 80
0000000000000050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0000000000000060 00 00 00 00 00 00 00 00 00 00 00 00 de 0a 00 00
0000000000000070 0c 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0000000000000080 00 00 00 00 00 00 00 00 00 00 00 00 00 00 fa 00
0000000000000090 b6 0b 00 00 00 00 00 00 80 c7 fe ff 50 0a 00 00
...
...

----------------------------------------------------------------
Alex Vesker (3):
      net/mlx5: Add Vendor Specific Capability access gateway
      net/mlx5: Add Crdump FW snapshot support
      net/mlx5: Add support for devlink region_snapshot parameter

Eran Ben Elisha (1):
      net/mlx5: Move all devlink related functions calls to devlink.c

Feras Daoud (3):
      net/mlx5: Handle SW reset of FW in error flow
      net/mlx5: Control CR-space access by different PFs
      net/mlx5: Issue SW reset on FW assert

Moshe Shemesh (8):
      net/mlx5: Refactor print health info
      net/mlx5: Create FW devlink health reporter
      net/mlx5: Add core dump register access functions
      net/mlx5: Add support for FW reporter dump
      net/mlx5: Report devlink health on FW issues
      net/mlx5: Add fw fatal devlink health reporter
      net/mlx5: Add support for FW fatal reporter dump
      net/mlx5: Report devlink health on FW fatal issues

 drivers/net/ethernet/mellanox/mlx5/core/Makefile   |   3 +-
 drivers/net/ethernet/mellanox/mlx5/core/devlink.c  |  72 +++
 drivers/net/ethernet/mellanox/mlx5/core/devlink.h  |  12 +
 .../net/ethernet/mellanox/mlx5/core/diag/crdump.c  | 210 ++++++++
 .../ethernet/mellanox/mlx5/core/diag/fw_tracer.c   | 143 +++++
 .../ethernet/mellanox/mlx5/core/diag/fw_tracer.h   |  14 +
 .../net/ethernet/mellanox/mlx5/core/en_selftest.c  |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/health.c   | 575 +++++++++++++++++----
 drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h |   6 +
 .../net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c  | 313 +++++++++++
 .../net/ethernet/mellanox/mlx5/core/lib/pci_vsc.h  |  33 ++
 drivers/net/ethernet/mellanox/mlx5/core/main.c     |  19 +-
 .../net/ethernet/mellanox/mlx5/core/mlx5_core.h    |   8 +-
 include/linux/mlx5/device.h                        |  10 +-
 include/linux/mlx5/driver.h                        |  20 +-
 include/linux/mlx5/mlx5_ifc.h                      |  17 +-
 16 files changed, 1357 insertions(+), 100 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/devlink.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/devlink.h
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/diag/crdump.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.h

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2019-05-07  6:01 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-05  0:32 [pull request][net-next 00/15] Mellanox, mlx5 Firmware devlink health and sw reset Saeed Mahameed
2019-05-05  0:32 ` [net-next 01/15] net/mlx5: Move all devlink related functions calls to devlink.c Saeed Mahameed
2019-05-05  0:32 ` [net-next 02/15] net/mlx5: Add Vendor Specific Capability access gateway Saeed Mahameed
2019-05-05  0:33 ` [net-next 03/15] net/mlx5: Add Crdump FW snapshot support Saeed Mahameed
2019-05-05 15:36   ` Jiri Pirko
2019-05-05  0:33 ` [net-next 04/15] net/mlx5: Add support for devlink region_snapshot parameter Saeed Mahameed
2019-05-05  0:33 ` [net-next 05/15] net/mlx5: Handle SW reset of FW in error flow Saeed Mahameed
2019-05-05  0:33 ` [net-next 06/15] net/mlx5: Control CR-space access by different PFs Saeed Mahameed
2019-05-05  0:33 ` [net-next 07/15] net/mlx5: Issue SW reset on FW assert Saeed Mahameed
2019-05-05 15:38   ` Jiri Pirko
2019-05-06 10:44     ` Moshe Shemesh
2019-05-05  0:33 ` [net-next 08/15] net/mlx5: Refactor print health info Saeed Mahameed
2019-05-05 15:42   ` Jiri Pirko
2019-05-05  0:33 ` [net-next 09/15] net/mlx5: Create FW devlink health reporter Saeed Mahameed
2019-05-05 15:42   ` Jiri Pirko
2019-05-06 10:45     ` Moshe Shemesh
2019-05-06 11:38       ` Jiri Pirko
2019-05-06 19:52         ` Saeed Mahameed
2019-05-06 21:46           ` Alexei Starovoitov
2019-05-07  5:59             ` Jiri Pirko
2019-05-07  6:01           ` Jiri Pirko
2019-05-07  0:11   ` Jakub Kicinski
2019-05-05  0:33 ` [net-next 10/15] net/mlx5: Add core dump register access functions Saeed Mahameed
2019-05-05  0:33 ` [net-next 11/15] net/mlx5: Add support for FW reporter dump Saeed Mahameed
2019-05-05 15:49   ` Jiri Pirko
2019-05-06 10:51     ` Moshe Shemesh
2019-05-06 11:37       ` Jiri Pirko
2019-05-05  0:33 ` [net-next 12/15] net/mlx5: Report devlink health on FW issues Saeed Mahameed
2019-05-05  0:33 ` [net-next 13/15] net/mlx5: Add fw fatal devlink health reporter Saeed Mahameed
2019-05-05  0:33 ` [net-next 14/15] net/mlx5: Add support for FW fatal reporter dump Saeed Mahameed
2019-05-05 15:52   ` Jiri Pirko
2019-05-06 10:54     ` Moshe Shemesh
2019-05-06 11:42       ` Jiri Pirko
2019-05-05  0:33 ` [net-next 15/15] net/mlx5: Report devlink health on FW fatal issues Saeed Mahameed

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).