From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jakub Kicinski Subject: Re: [PATCH RFC net-next 00/19] Devlink health reporting and recovery system Date: Mon, 31 Dec 2018 17:47:27 -0800 Message-ID: <20181231174727.6280fc3c@cakuba.hsd1.ca.comcast.net> References: <1546266733-9512-1-git-send-email-eranbe@mellanox.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: netdev@vger.kernel.org, "David S. Miller" , Jiri Pirko , Moshe Shemesh , Aya Levin , Tal Alon , Ariel Almog To: Eran Ben Elisha Return-path: Received: from mail-pl1-f195.google.com ([209.85.214.195]:44594 "EHLO mail-pl1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728704AbfAABrd (ORCPT ); Mon, 31 Dec 2018 20:47:33 -0500 Received: by mail-pl1-f195.google.com with SMTP id e11so13113298plt.11 for ; Mon, 31 Dec 2018 17:47:32 -0800 (PST) In-Reply-To: <1546266733-9512-1-git-send-email-eranbe@mellanox.com> Sender: netdev-owner@vger.kernel.org List-ID: On Mon, 31 Dec 2018 16:31:54 +0200, Eran Ben Elisha wrote: > The health mechanism is targeted for Real Time Alerting, in order to know when > something bad had happened to a PCI device > - Provide alert debug information > - Self healing > - If problem needs vendor support, provide a way to gather all needed debugging > information. > > The main idea is to unify and centralize driver health reports in the > generic devlink instance and allow the user to set different > attributes of the health reporting and recovery procedures. > > The devlink health reporter: > Device driver creates a "health reporter" per each error/health type. > Error/Health type can be a known/generic (eg pci error, fw error, rx/tx error) > or unknown (driver specific). > For each registered health reporter a driver can issue error/health reports > asynchronously. All health reports handling is done by devlink. > Device driver can provide specific callbacks for each "health reporter", e.g. > - Recovery procedures > - Diagnostics and object dump procedures > - OOB initial attributes > Different parts of the driver can register different types of health reporters > with different handlers. > > Once an error is reported, devlink health will do the following actions: > * A log is being send to the kernel trace events buffer > * Health status and statistics are being updated for the reporter instance > * Object dump is being taken and saved at the reporter instance (as long as > there is no other Objdump which is already stored) > * Auto recovery attempt is being done. Depends on: > - Auto-recovery configuration > - Grace period vs. time passed since last recover > > The user interface: > User can access/change each reporter attributes and driver specific callbacks > via devlink, e.g per error type (per health reporter) > - Configure reporter's generic attributes (like: Disable/enable auto recovery) > - Invoke recovery procedure > - Run diagnostics > - Object dump Thanks for continuing this work! > The devlink health interface (via netlink): > DEVLINK_CMD_HEALTH_REPORTER_GET > Retrieves status and configuration info per DEV and reporter. > DEVLINK_CMD_HEALTH_REPORTER_SET > Allows reporter-related configuration setting. > DEVLINK_CMD_HEALTH_REPORTER_RECOVER > Triggers a reporter's recovery procedure. > DEVLINK_CMD_HEALTH_REPORTER_DIAGNOSE > Retrieves diagnostics data from a reporter on a device. > DEVLINK_CMD_HEALTH_REPORTER_OBJDUMP_GET The addition of "objdump" and its marshalling is a bit disappointing. It seemed to me when region snapshots were added that they would serve this exact purpose. Taking a region snapshot or "core dump", if you will, after something went south. Now it's confusing to have two mechanisms serving the same purpose. It's unclear from quick reading of the code how if the buffers get timestamped. Can you report more than one? About the marshalling, I'm not sure it belongs in the kernel. There is precedent for adding interpretation of FW blobs in user space (ethtool). IMHO it's a more scalable approach, if we want to avoid having 100 kLoC drivers. Amount of code you add to print the simple example from last patch is not inspiring confidence. And on the bike shedding side :) -> I think you should steer clear of calling this objdump, as it has very little to do with the functionality of well-known objdump tool. Its not even clear what the object the name is referring to is. Long story short the overlap with region snapshots makes it unclear what purpose either serves, and IMHO we should avoid the marshalling by teaching user space how to interpret snapshots. Preferably we only have one dump mechanism, and user space can be taught the interpretation once. > Retrieves the last stored objdump. Devlink health > saves a single objdump. If an objdump is not already stored by the devlink > for this reporter, devlink generates a new objdump. > Objdump output is defined by the reporter. > DEVLINK_CMD_HEALTH_REPORTER_OBJDUMP_CLEAR > Clears the last saved objdump file for the specified reporter. > > > netlink > +--------------------------+ > | | > | + | > | | | > +--------------------------+ > |request for ops > |(diagnose, > mlx5_core devlink |recover, > |dump) > +--------+ +--------------------------+ > | | | reporter| | > | | | +---------v----------+ | > | | ops execution | | | | > | <----------------------------------+ | | > | | | | | | > | | | + ^------------------+ | > | | | | request for ops | > | | | | (recover, dump) | > | | | | | > | | | +-+------------------+ | > | | health report | | health handler | | > | +-------------------------------> | | > | | | +--------------------+ | > | | health reporter create | | > | +----------------------------> | > +--------+ +--------------------------+ > > Available reporters: > In this patchset, three reporters of mlx5e driver are included. The FW > reporters implement devlink_health_reporter diagnostic, object dump and > recovery procedures. The TX reporter implements devlink_health_reporter > diagnostic and recovery procedures. > > This RFC is based on the same concepts as previous RFC I sent earlier this > year: "[RFC PATCH iproute2-next] System specification health API". The API was > changed, also devlink code and mlx5e reporters were not available at the > previous RFC.