From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.0 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_PASS,UNPARSEABLE_RELAY, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7D193C282CC for ; Sun, 10 Feb 2019 18:30:21 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 4486F2146F for ; Sun, 10 Feb 2019 18:30:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727041AbfBJSaU (ORCPT ); Sun, 10 Feb 2019 13:30:20 -0500 Received: from mail-il-dmz.mellanox.com ([193.47.165.129]:47606 "EHLO mellanox.co.il" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726267AbfBJSaS (ORCPT ); Sun, 10 Feb 2019 13:30:18 -0500 Received: from Internal Mail-Server by MTLPINE1 (envelope-from ayal@mellanox.com) with ESMTPS (AES256-SHA encrypted); 10 Feb 2019 20:30:13 +0200 Received: from dev-l-vrt-210.mtl.labs.mlnx (dev-l-vrt-210.mtl.labs.mlnx [10.134.210.1]) by labmailer.mlnx (8.13.8/8.13.8) with ESMTP id x1AIUDbl002954; Sun, 10 Feb 2019 20:30:13 +0200 Received: from dev-l-vrt-210.mtl.labs.mlnx (localhost [127.0.0.1]) by dev-l-vrt-210.mtl.labs.mlnx (8.15.2/8.15.2/Debian-8ubuntu1) with ESMTP id x1AIUDgr010465; Sun, 10 Feb 2019 20:30:13 +0200 Received: (from ayal@localhost) by dev-l-vrt-210.mtl.labs.mlnx (8.15.2/8.15.2/Submit) id x1AIUDfu010462; Sun, 10 Feb 2019 20:30:13 +0200 From: Aya Levin To: David Ahern , netdev@vger.kernel.org, "David S. Miller" , Jiri Pirko Cc: Moshe Shemesh , Aya Levin , Eran Ben Elisha , Tal Alon , Ariel Almog Subject: [PATCH for-next 4/4] devlink: add health command support Date: Sun, 10 Feb 2019 20:28:49 +0200 Message-Id: <1549823329-10377-5-git-send-email-ayal@mellanox.com> X-Mailer: git-send-email 1.8.4.3 In-Reply-To: <1549823329-10377-1-git-send-email-ayal@mellanox.com> References: <1549532202-943-1-git-send-email-eranbe@mellanox.com> <1549823329-10377-1-git-send-email-ayal@mellanox.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org This patch adds support for the following commands: devlink health show [DEV reporter REPORTE_NAME] devlink health recover DEV reporter REPORTER_NAME devlink health diagnose DEV reporter REPORTER_NAME devlink health dump show DEV reporter REPORTER_NAME devlink health dump clear DEV reporter REPORTER_NAME devlink health set DEV reporter REPORTER_NAME NAME VALUE * show: Devlink health show command displays status and configuration info on specific reporter on a device or dump the info on all reporters on all devices. * recover: Devlink health recover enables the user to initiate a recovery on a reporter. This operation will increment the recoveries counter displayed in the show command. * diagnose: Devlink health diagnose enables the user to retrieve diagnostics data on a reporter on a device. The command's output is a free text defined by the reporter. * dump show: Devlink health dump show displays the last saved dump. Devlink health saves a single dump. If a dump is not already stored by the Devlink for this reporter, Devlink generates a new dump. The dump can be generated automatically when a reporter reports on an error or manually by user's request. dump output is defined by the reporter. * dump clear: Devlink health dump clear, deletes the last saved dump file. * set: Devlink health set, enables the user to configure: 1) grace_period [msec] time interval between auto recoveries. 2) auto_recover [true/false] whether the devlink should execute automatic recover on error. Examples: $devlink health show pci/0000:00:09.0 reporter tx pci/0000:00:09.0: name tx state healthy #err 0 #recover 1 last_dump_ts N/A parameters: grace period 600 auto_recover true $devlink health diagnose pci/0000:00:09.0 reporter tx SQs: sqn: 4283 HW state: 1 stopped: false sqn: 4288 HW state: 1 stopped: false sqn: 4293 HW state: 1 stopped: false sqn: 4298 HW state: 1 stopped: false sqn: 4303 HW state: 1 stopped: false $devlink health dump show pci/0000:00:09.0 reporter tx TX dump data $devlink health dump clear pci/0000:00:09.0 reporter tx $devlink health set pci/0000:00:09.0 reporter tx grace_period 3500 $devlink health set pci/0000:00:09.0 reporter tx auto_recover false Signed-off-by: Aya Levin Reviewed-by: Moshe Shemesh --- devlink/devlink.c | 551 ++++++++++++++++++++++++++++++++++++++++++- include/uapi/linux/devlink.h | 23 ++ man/man8/devlink-health.8 | 176 ++++++++++++++ man/man8/devlink.8 | 7 +- 4 files changed, 755 insertions(+), 2 deletions(-) create mode 100644 man/man8/devlink-health.8 diff --git a/devlink/devlink.c b/devlink/devlink.c index a433883f1b2b..2ddbf389a3ea 100644 --- a/devlink/devlink.c +++ b/devlink/devlink.c @@ -22,6 +22,8 @@ #include #include #include +#include +#include #include "SNAPSHOT.h" #include "list.h" @@ -41,6 +43,10 @@ #define PARAM_CMODE_PERMANENT_STR "permanent" #define DL_ARGS_REQUIRED_MAX_ERR_LEN 80 +#define HEALTH_REPORTER_STATE_HEALTHY_STR "healthy" +#define HEALTH_REPORTER_STATE_ERROR_STR "error" +#define HEALTH_REPORTER_TIMESTAMP_FORMAT_LEN 80 + static int g_new_line_count; #define pr_err(args...) fprintf(stderr, ##args) @@ -200,6 +206,9 @@ static void ifname_map_free(struct ifname_map *ifname_map) #define DL_OPT_REGION_SNAPSHOT_ID BIT(22) #define DL_OPT_REGION_ADDRESS BIT(23) #define DL_OPT_REGION_LENGTH BIT(24) +#define DL_OPT_HEALTH_REPORTER_NAME BIT(25) +#define DL_OPT_HEALTH_REPORTER_GRACEFUL_PERIOD BIT(26) +#define DL_OPT_HEALTH_REPORTER_AUTO_RECOVER BIT(27) struct dl_opts { uint32_t present; /* flags of present items */ @@ -231,6 +240,9 @@ struct dl_opts { uint32_t region_snapshot_id; uint64_t region_address; uint64_t region_length; + const char *reporter_name; + uint64_t reporter_graceful_period; + bool reporter_auto_recover; }; struct dl { @@ -391,6 +403,17 @@ static const enum mnl_attr_data_type devlink_policy[DEVLINK_ATTR_MAX + 1] = { [DEVLINK_ATTR_INFO_VERSION_STORED] = MNL_TYPE_NESTED, [DEVLINK_ATTR_INFO_VERSION_NAME] = MNL_TYPE_STRING, [DEVLINK_ATTR_INFO_VERSION_VALUE] = MNL_TYPE_STRING, + [DEVLINK_ATTR_FMSG] = MNL_TYPE_NESTED, + [DEVLINK_ATTR_FMSG_OBJ_NAME] = MNL_TYPE_STRING, + [DEVLINK_ATTR_FMSG_OBJ_VALUE_TYPE] = MNL_TYPE_U8, + [DEVLINK_ATTR_HEALTH_REPORTER] = MNL_TYPE_NESTED, + [DEVLINK_ATTR_HEALTH_REPORTER_NAME] = MNL_TYPE_STRING, + [DEVLINK_ATTR_HEALTH_REPORTER_STATE] = MNL_TYPE_U8, + [DEVLINK_ATTR_HEALTH_REPORTER_ERR] = MNL_TYPE_U64, + [DEVLINK_ATTR_HEALTH_REPORTER_RECOVER] = MNL_TYPE_U64, + [DEVLINK_ATTR_HEALTH_REPORTER_DUMP_TS] = MNL_TYPE_U64, + [DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD] = MNL_TYPE_U64, + [DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER] = MNL_TYPE_U8, }; static int attr_cb(const struct nlattr *attr, void *data) @@ -822,6 +845,24 @@ static int dl_argv_uint16_t(struct dl *dl, uint16_t *p_val) return 0; } +static int dl_argv_bool(struct dl *dl, bool *p_val) +{ + char *str = dl_argv_next(dl); + int err; + + if (!str) { + pr_err("Boolean argument expected\n"); + return -EINVAL; + } + + err = strtobool(str, p_val); + if (err) { + pr_err("\"%s\" is not a valid boolean value\n", str); + return err; + } + return 0; +} + static int dl_argv_str(struct dl *dl, const char **p_str) { const char *str = dl_argv_next(dl); @@ -976,6 +1017,7 @@ static const struct dl_args_metadata dl_args_required[] = { {DL_OPT_REGION_SNAPSHOT_ID, "Region snapshot id expected.\n"}, {DL_OPT_REGION_ADDRESS, "Region address value expected.\n"}, {DL_OPT_REGION_LENGTH, "Region length value expected.\n"}, + {DL_OPT_HEALTH_REPORTER_NAME, "Reporter's name is expected.\n"}, }; static int validate_finding_required_dl_args(uint32_t o_required, @@ -1231,6 +1273,28 @@ static int dl_argv_parse(struct dl *dl, uint32_t o_required, if (err) return err; o_found |= DL_OPT_REGION_LENGTH; + } else if (dl_argv_match(dl, "reporter") && + (o_all & DL_OPT_HEALTH_REPORTER_NAME)) { + dl_arg_inc(dl); + err = dl_argv_str(dl, &opts->reporter_name); + if (err) + return err; + o_found |= DL_OPT_HEALTH_REPORTER_NAME; + } else if (dl_argv_match(dl, "grace_period") && + (o_all & DL_OPT_HEALTH_REPORTER_GRACEFUL_PERIOD)) { + dl_arg_inc(dl); + err = dl_argv_uint64_t(dl, + &opts->reporter_graceful_period); + if (err) + return err; + o_found |= DL_OPT_HEALTH_REPORTER_GRACEFUL_PERIOD; + } else if (dl_argv_match(dl, "auto_recover") && + (o_all & DL_OPT_HEALTH_REPORTER_AUTO_RECOVER)) { + dl_arg_inc(dl); + err = dl_argv_bool(dl, &opts->reporter_auto_recover); + if (err) + return err; + o_found |= DL_OPT_HEALTH_REPORTER_AUTO_RECOVER; } else { pr_err("Unknown option \"%s\"\n", dl_argv(dl)); return -EINVAL; @@ -1328,6 +1392,16 @@ static void dl_opts_put(struct nlmsghdr *nlh, struct dl *dl) if (opts->present & DL_OPT_REGION_LENGTH) mnl_attr_put_u64(nlh, DEVLINK_ATTR_REGION_CHUNK_LEN, opts->region_length); + if (opts->present & DL_OPT_HEALTH_REPORTER_NAME) + mnl_attr_put_strz(nlh, DEVLINK_ATTR_HEALTH_REPORTER_NAME, + opts->reporter_name); + if (opts->present & DL_OPT_HEALTH_REPORTER_GRACEFUL_PERIOD) + mnl_attr_put_u64(nlh, + DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD, + opts->reporter_graceful_period); + if (opts->present & DL_OPT_HEALTH_REPORTER_AUTO_RECOVER) + mnl_attr_put_u8(nlh, DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER, + opts->reporter_auto_recover); } static int dl_argv_parse_put(struct nlmsghdr *nlh, struct dl *dl, @@ -5677,11 +5751,482 @@ static int cmd_region(struct dl *dl) return -ENOENT; } +static int cmd_health_set_params(struct dl *dl) +{ + struct nlmsghdr *nlh; + int err; + + nlh = mnlg_msg_prepare(dl->nlg, DEVLINK_CMD_HEALTH_REPORTER_SET, + NLM_F_REQUEST | NLM_F_ACK); + err = dl_argv_parse(dl, DL_OPT_HANDLE | DL_OPT_HEALTH_REPORTER_NAME, + DL_OPT_HEALTH_REPORTER_GRACEFUL_PERIOD | + DL_OPT_HEALTH_REPORTER_AUTO_RECOVER); + if (err) + return err; + + dl_opts_put(nlh, dl); + return _mnlg_socket_sndrcv(dl->nlg, nlh, NULL, NULL); +} + +static int cmd_health_dump_clear(struct dl *dl) +{ + struct nlmsghdr *nlh; + int err; + + nlh = mnlg_msg_prepare(dl->nlg, + DEVLINK_CMD_HEALTH_REPORTER_DUMP_CLEAR, + NLM_F_REQUEST | NLM_F_ACK); + + err = dl_argv_parse_put(nlh, dl, + DL_OPT_HANDLE | DL_OPT_HEALTH_REPORTER_NAME, 0); + if (err) + return err; + + dl_opts_put(nlh, dl); + return _mnlg_socket_sndrcv(dl->nlg, nlh, NULL, NULL); +} + +static int health_value_show(struct dl *dl, int type, struct nlattr *nl_data) +{ + const char *str; + uint8_t *data; + uint32_t len; + uint64_t val_u64; + uint32_t val_u32; + uint16_t val_u16; + uint8_t val_u8; + bool val_bool; + int i; + + switch (type) { + case MNL_TYPE_FLAG: + val_bool = mnl_attr_get_u8(nl_data); + if (dl->json_output) + jsonw_bool(dl->jw, !!val_bool); + else + pr_out("%s ", val_bool ? "true" : "false"); + break; + case MNL_TYPE_U8: + val_u8 = mnl_attr_get_u8(nl_data); + if (dl->json_output) + jsonw_uint(dl->jw, val_u8); + else + pr_out("%u ", val_u8); + break; + case MNL_TYPE_U16: + val_u16 = mnl_attr_get_u16(nl_data); + if (dl->json_output) + jsonw_uint(dl->jw, val_u16); + else + pr_out("%u ", val_u16); + break; + case MNL_TYPE_U32: + val_u32 = mnl_attr_get_u32(nl_data); + if (dl->json_output) + jsonw_uint(dl->jw, val_u32); + else + pr_out("%u ", val_u32); + break; + case MNL_TYPE_U64: + val_u64 = mnl_attr_get_u64(nl_data); + if (dl->json_output) + jsonw_u64(dl->jw, val_u64); + else + pr_out("%lu ", val_u64); + break; + case MNL_TYPE_NUL_STRING: + str = mnl_attr_get_str(nl_data); + if (dl->json_output) + jsonw_string(dl->jw, str); + else + pr_out("%s ", str); + break; + case MNL_TYPE_BINARY: + len = mnl_attr_get_payload_len(nl_data); + data = mnl_attr_get_payload(nl_data); + i = 0; + while (i < len) { + if (dl->json_output) { + jsonw_printf(dl->jw, "%d", data[i]); + } else { + pr_out("%02x ", data[i]); + if (!(i % 15)) + pr_out("\n"); + } + i++; + } + break; + default: + return -EINVAL; + } + return MNL_CB_OK; +} + +struct nest_qentry { + int attr_type; + + TAILQ_ENTRY(nest_qentry) nest_entries; +}; + +struct health_cb_data { + struct dl *dl; + uint8_t value_type; + + TAILQ_HEAD(, nest_qentry) qhead; +}; + +static int cmd_health_nest_queue(struct health_cb_data *health_data, + uint8_t *attr_value, bool insert) +{ + struct nest_qentry *entry = NULL; + + if (insert) { + entry = malloc(sizeof(struct nest_qentry)); + if (!entry) + return -ENOMEM; + + entry->attr_type = *attr_value; + TAILQ_INSERT_HEAD(&health_data->qhead, entry, nest_entries); + } else { + if (TAILQ_EMPTY(&health_data->qhead)) + return MNL_CB_ERROR; + entry = TAILQ_FIRST(&health_data->qhead); + *attr_value = entry->attr_type; + TAILQ_REMOVE(&health_data->qhead, entry, nest_entries); + free(entry); + } + return MNL_CB_OK; +} + +static int cmd_health_nest(struct health_cb_data *health_data, + uint8_t nest_value, bool start) +{ + struct dl *dl = health_data->dl; + uint8_t value = nest_value; + int err; + + err = cmd_health_nest_queue(health_data, &value, start); + if (err != MNL_CB_OK) + return err; + + switch (value) { + case DEVLINK_ATTR_FMSG_OBJ_NEST_START: + if (start) + pr_out_entry_start(dl); + else + pr_out_entry_end(dl); + break; + case DEVLINK_ATTR_FMSG_PAIR_NEST_START: + break; + case DEVLINK_ATTR_FMSG_ARR_NEST_START: + if (dl->json_output) { + if (start) + jsonw_start_array(dl->jw); + else + jsonw_end_array(dl->jw); + } else { + if (start) { + __pr_out_newline(); + __pr_out_indent_inc(); + } else { + __pr_out_indent_dec(); + } + } + break; + default: + return -EINVAL; + } + return MNL_CB_OK; +} + +static int cmd_health_object_cb(const struct nlmsghdr *nlh, void *data) +{ + struct genlmsghdr *genl = mnl_nlmsg_get_payload(nlh); + struct nlattr *tb[DEVLINK_ATTR_MAX + 1] = {}; + struct health_cb_data *health_data = data; + struct dl *dl = health_data->dl; + struct nlattr *nla_object; + const char *name; + int attr_type; + int err; + + mnl_attr_parse(nlh, sizeof(*genl), attr_cb, tb); + if (!tb[DEVLINK_ATTR_FMSG]) + return MNL_CB_ERROR; + + mnl_attr_for_each_nested(nla_object, tb[DEVLINK_ATTR_FMSG]) { + attr_type = mnl_attr_get_type(nla_object); + switch (attr_type) { + case DEVLINK_ATTR_FMSG_OBJ_NEST_START: + case DEVLINK_ATTR_FMSG_PAIR_NEST_START: + case DEVLINK_ATTR_FMSG_ARR_NEST_START: + err = cmd_health_nest(health_data, attr_type, true); + if (err != MNL_CB_OK) + return err; + break; + case DEVLINK_ATTR_FMSG_NEST_END: + err = cmd_health_nest(health_data, attr_type, false); + if (err != MNL_CB_OK) + return err; + break; + case DEVLINK_ATTR_FMSG_OBJ_NAME: + name = mnl_attr_get_str(nla_object); + if (dl->json_output) + jsonw_name(dl->jw, name); + else + pr_out("%s: ", name); + break; + case DEVLINK_ATTR_FMSG_OBJ_VALUE_TYPE: + health_data->value_type = mnl_attr_get_u8(nla_object); + break; + case DEVLINK_ATTR_FMSG_OBJ_VALUE_DATA: + err = health_value_show(dl, health_data->value_type, + nla_object); + if (err != MNL_CB_OK) + return err; + break; + default: + return -EINVAL; + } + } + return MNL_CB_OK; +} + +static int cmd_health_object_common(struct dl *dl, uint8_t cmd) +{ + struct nlmsghdr *nlh; + struct health_cb_data data; + int err; + + nlh = mnlg_msg_prepare(dl->nlg, cmd, + NLM_F_REQUEST | NLM_F_ACK); + + err = dl_argv_parse_put(nlh, dl, + DL_OPT_HANDLE | DL_OPT_HEALTH_REPORTER_NAME, 0); + if (err) + return err; + + data.dl = dl; + TAILQ_INIT(&data.qhead); + err = _mnlg_socket_sndrcv(dl->nlg, nlh, cmd_health_object_cb, &data); + return err; +} + +static int cmd_health_diagnose(struct dl *dl) +{ + return cmd_health_object_common(dl, DEVLINK_CMD_HEALTH_REPORTER_DIAGNOSE); +} + +static int cmd_health_dump_show(struct dl *dl) +{ + return cmd_health_object_common(dl, DEVLINK_CMD_HEALTH_REPORTER_DUMP_GET); +} + +static int cmd_health_recover(struct dl *dl) +{ + struct nlmsghdr *nlh; + int err; + + nlh = mnlg_msg_prepare(dl->nlg, DEVLINK_CMD_HEALTH_REPORTER_RECOVER, + NLM_F_REQUEST | NLM_F_ACK); + + err = dl_argv_parse_put(nlh, dl, + DL_OPT_HANDLE | DL_OPT_HEALTH_REPORTER_NAME, 0); + if (err) + return err; + + dl_opts_put(nlh, dl); + return _mnlg_socket_sndrcv(dl->nlg, nlh, NULL, NULL); +} + +enum devlink_health_reporter_state { + DEVLINK_HEALTH_REPORTER_STATE_HEALTHY, + DEVLINK_HEALTH_REPORTER_STATE_ERROR, +}; + +static const char *health_state_name(uint8_t state) +{ + switch (state) { + case DEVLINK_HEALTH_REPORTER_STATE_HEALTHY: + return HEALTH_REPORTER_STATE_HEALTHY_STR; + case DEVLINK_HEALTH_REPORTER_STATE_ERROR: + return HEALTH_REPORTER_STATE_ERROR_STR; + default: + return ""; + } +} + +static void format_logtime(uint64_t time_ms, char *output) +{ + struct sysinfo s_info; + struct tm *info; + time_t now, sec; + int err; + + time(&now); + info = localtime(&now); + err = sysinfo(&s_info); + if (err) + goto out; + /* substruct uptime in sec from now + * add time_ms and 5 minutes factor + */ + sec = now - (s_info.uptime - time_ms / 1000) + 300; + info = localtime(&sec); +out: + strftime(output, HEALTH_REPORTER_TIMESTAMP_FORMAT_LEN, + "%Y-%b-%d %H:%M:%S", info); +} + +static void pr_out_health(struct dl *dl, struct nlattr **tb) +{ + char dump_time_date[HEALTH_REPORTER_TIMESTAMP_FORMAT_LEN] = "N/A"; + struct nlattr *hlt[DEVLINK_ATTR_MAX + 1] = {}; + enum devlink_health_reporter_state state; + bool auto_recover = false; + const struct nlattr *attr; + uint64_t time_ms; + int err; + + state = DEVLINK_HEALTH_REPORTER_STATE_HEALTHY; + + err = mnl_attr_parse_nested(tb[DEVLINK_ATTR_HEALTH_REPORTER], attr_cb, + hlt); + if (err != MNL_CB_OK) + return; + + if (!hlt[DEVLINK_ATTR_HEALTH_REPORTER_NAME] || + !hlt[DEVLINK_ATTR_HEALTH_REPORTER_ERR] || + !hlt[DEVLINK_ATTR_HEALTH_REPORTER_RECOVER] || + !hlt[DEVLINK_ATTR_HEALTH_REPORTER_STATE] || + !hlt[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER] || + !hlt[DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD]) + return; + + if (hlt[DEVLINK_ATTR_HEALTH_REPORTER_DUMP_TS]) { + attr = hlt[DEVLINK_ATTR_HEALTH_REPORTER_DUMP_TS]; + time_ms = mnl_attr_get_u64(attr); + format_logtime(time_ms, dump_time_date); + } + pr_out_handle_start_arr(dl, tb); + + pr_out_str(dl, "name", + mnl_attr_get_str(hlt[DEVLINK_ATTR_HEALTH_REPORTER_NAME])); + if (!dl->json_output) { + __pr_out_newline(); + __pr_out_indent_inc(); + } + state = mnl_attr_get_u8(hlt[DEVLINK_ATTR_HEALTH_REPORTER_STATE]); + pr_out_str(dl, "state", health_state_name(state)); + pr_out_u64(dl, "#err", + mnl_attr_get_u64(hlt[DEVLINK_ATTR_HEALTH_REPORTER_ERR])); + pr_out_u64(dl, "#recover", + mnl_attr_get_u64(hlt[DEVLINK_ATTR_HEALTH_REPORTER_RECOVER])); + pr_out_str(dl, "last_dump_ts", dump_time_date); + pr_out_array_start(dl, "parameters"); + pr_out_entry_start(dl); + pr_out_u64(dl, "grace_period", + mnl_attr_get_u64(hlt[DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD])); + auto_recover = mnl_attr_get_u8(hlt[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER]); + pr_out_bool(dl, "auto_recover", auto_recover); + pr_out_entry_end(dl); + pr_out_array_end(dl); + __pr_out_indent_dec(); + __pr_out_indent_dec(); + pr_out_handle_end(dl); +} + +static int cmd_health_show_cb(const struct nlmsghdr *nlh, void *data) +{ + struct genlmsghdr *genl = mnl_nlmsg_get_payload(nlh); + struct nlattr *tb[DEVLINK_ATTR_MAX + 1] = {}; + struct dl *dl = data; + + mnl_attr_parse(nlh, sizeof(*genl), attr_cb, tb); + if (!tb[DEVLINK_ATTR_BUS_NAME] || !tb[DEVLINK_ATTR_DEV_NAME] || + !tb[DEVLINK_ATTR_HEALTH_REPORTER]) + return MNL_CB_ERROR; + + pr_out_health(dl, tb); + + return MNL_CB_OK; +} + +static int cmd_health_show(struct dl *dl) +{ + struct nlmsghdr *nlh; + uint16_t flags = NLM_F_REQUEST | NLM_F_ACK; + int err; + + if (dl_argc(dl) == 0) + flags |= NLM_F_DUMP; + nlh = mnlg_msg_prepare(dl->nlg, DEVLINK_CMD_HEALTH_REPORTER_GET, + flags); + + if (dl_argc(dl) > 0) { + err = dl_argv_parse_put(nlh, dl, + DL_OPT_HANDLE | + DL_OPT_HEALTH_REPORTER_NAME, 0); + if (err) + return err; + } + pr_out_section_start(dl, "health"); + + err = _mnlg_socket_sndrcv(dl->nlg, nlh, cmd_health_show_cb, dl); + if (err) + return err; + pr_out_section_end(dl); + return err; +} + +static void cmd_health_help(void) +{ + pr_err("Usage: devlink health show [ dev DEV reporter REPORTER_NAME ]\n"); + pr_err("Usage: devlink health recover DEV reporter REPORTER_NAME\n"); + pr_err("Usage: devlink health diagnose DEV reporter REPORTER_NAME\n"); + pr_err("Usage: devlink health dump show DEV reporter REPORTER_NAME\n"); + pr_err("Usage: devlink health dump clear DEV reporter REPORTER_NAME\n"); + pr_err("Usage: devlink health set DEV reporter REPORTER_NAME NAME VALUE\n"); +} + +static int cmd_health(struct dl *dl) +{ + if (dl_argv_match(dl, "help")) { + cmd_health_help(); + return 0; + } else if (dl_argv_match(dl, "show") || + dl_argv_match(dl, "list") || dl_no_arg(dl)) { + dl_arg_inc(dl); + return cmd_health_show(dl); + } else if (dl_argv_match(dl, "recover")) { + dl_arg_inc(dl); + return cmd_health_recover(dl); + } else if (dl_argv_match(dl, "diagnose")) { + dl_arg_inc(dl); + return cmd_health_diagnose(dl); + } else if (dl_argv_match(dl, "dump")) { + dl_arg_inc(dl); + if (dl_argv_match(dl, "show")) { + dl_arg_inc(dl); + return cmd_health_dump_show(dl); + } else if (dl_argv_match(dl, "clear")) { + dl_arg_inc(dl); + return cmd_health_dump_clear(dl); + } + } else if (dl_argv_match(dl, "set")) { + dl_arg_inc(dl); + return cmd_health_set_params(dl); + } + + pr_err("Command \"%s\" not found\n", dl_argv(dl)); + return -ENOENT; +} + static void help(void) { pr_err("Usage: devlink [ OPTIONS ] OBJECT { COMMAND | help }\n" " devlink [ -f[orce] ] -b[atch] filename\n" - "where OBJECT := { dev | port | sb | monitor | dpipe | resource | region }\n" + "where OBJECT := { dev | port | sb | monitor | dpipe | resource | region | health }\n" " OPTIONS := { -V[ersion] | -n[o-nice-names] | -j[son] | -p[retty] | -v[erbose] }\n"); } @@ -5714,7 +6259,11 @@ static int dl_cmd(struct dl *dl, int argc, char **argv) } else if (dl_argv_match(dl, "region")) { dl_arg_inc(dl); return cmd_region(dl); + } else if (dl_argv_match(dl, "health")) { + dl_arg_inc(dl); + return cmd_health(dl); } + pr_err("Object \"%s\" not found\n", dl_argv(dl)); return -ENOENT; } diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h index d51b59a7b8ee..bec76e94bc47 100644 --- a/include/uapi/linux/devlink.h +++ b/include/uapi/linux/devlink.h @@ -95,6 +95,12 @@ enum devlink_command { DEVLINK_CMD_PORT_PARAM_DEL, DEVLINK_CMD_INFO_GET, /* can dump */ + DEVLINK_CMD_HEALTH_REPORTER_GET, + DEVLINK_CMD_HEALTH_REPORTER_SET, + DEVLINK_CMD_HEALTH_REPORTER_RECOVER, + DEVLINK_CMD_HEALTH_REPORTER_DIAGNOSE, + DEVLINK_CMD_HEALTH_REPORTER_DUMP_GET, + DEVLINK_CMD_HEALTH_REPORTER_DUMP_CLEAR, /* add new commands above here */ __DEVLINK_CMD_MAX, @@ -302,6 +308,23 @@ enum devlink_attr { DEVLINK_ATTR_SB_POOL_CELL_SIZE, /* u32 */ + DEVLINK_ATTR_FMSG, /* nested */ + DEVLINK_ATTR_FMSG_OBJ_NEST_START, /* flag */ + DEVLINK_ATTR_FMSG_PAIR_NEST_START, /* flag */ + DEVLINK_ATTR_FMSG_ARR_NEST_START, /* flag */ + DEVLINK_ATTR_FMSG_NEST_END, /* flag */ + DEVLINK_ATTR_FMSG_OBJ_NAME, /* string */ + DEVLINK_ATTR_FMSG_OBJ_VALUE_TYPE, /* u8 */ + DEVLINK_ATTR_FMSG_OBJ_VALUE_DATA, /* dynamic */ + + DEVLINK_ATTR_HEALTH_REPORTER, /* nested */ + DEVLINK_ATTR_HEALTH_REPORTER_NAME, /* string */ + DEVLINK_ATTR_HEALTH_REPORTER_STATE, /* u8 */ + DEVLINK_ATTR_HEALTH_REPORTER_ERR, /* u64 */ + DEVLINK_ATTR_HEALTH_REPORTER_RECOVER, /* u64 */ + DEVLINK_ATTR_HEALTH_REPORTER_DUMP_TS, /* u64 */ + DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD, /* u64 */ + DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER, /* u8 */ /* add new attributes above here, update the policy in devlink.c */ __DEVLINK_ATTR_MAX, diff --git a/man/man8/devlink-health.8 b/man/man8/devlink-health.8 new file mode 100644 index 000000000000..2824d4d1bbf1 --- /dev/null +++ b/man/man8/devlink-health.8 @@ -0,0 +1,176 @@ +.TH DEVLINK\-HEALTH 8 "27 Dec 2018" "iproute2" "Linux" +.SH NAME +devlink-health \- devlink health reporting and recovery +.SH SYNOPSIS +.sp +.ad l +.in +8 +.ti -8 +.B devlink +.RI "[ " OPTIONS " ]" +.B health +.RI " { " COMMAND " | " +.BR help " }" +.sp +.ti -8 +.IR OPTIONS " := { " +\fB\-V\fR[\fIersion\fR] } +.ti -8 +.BR "devlink health show" +.RI "[ " +.RI "" DEV "" +.BR " reporter " +.RI ""REPORTER " ] " +.ti -8 +.BR "devlink health recover" +.RI "" DEV "" +.BR "reporter" +.RI "" REPORTER "" +.ti -8 +.BR "devlink health diagnose" +.RI "" DEV "" +.BR "reporter" +.RI "" REPORTER "" +.ti -8 +.BR "devlink health dump show" +.RI "" DEV "" +.BR "reporter" +.RI "" REPORTER "" +.ti -8 +.BR "devlink health dump clear" +.RI "" DEV "" +.BR "reporter" +.RI "" REPORTER "" +.ti -8 +.BR "devlink health set" +.RI "" DEV "" +.BR "reporter" +.RI "" REPORTER "" +.RI "" NAME "" +.RI "" VALUE "" +.ti -8 +.B devlink health help +.SH "DESCRIPTION" +.SS devlink health show - Show status and configuration on all supported reporters on all devlink devices. +.PP +.I "DEV" +- specifies the devlink device. +.PP +.I "REPORTER" +- specifies the reporter's name registered on the devlink device. +.SS devlink health recover - Initiate a recovery operation on a reporter. +This action performs a recovery and increases the recoveries counter on success. +.PP +.I "DEV" +- specifies the devlink device. +.PP +.I "REPORTER" +- specifies the reporter's name registered on the devlink device. +.SS devlink health diagnose - Retrieve diagnostics data on a reporter. +.PP +.I "DEV" +- specifies the devlink device. +.PP +.I "REPORTER" +- specifies the reporter's name registered on the devlink device. +.SS devlink health dump show - Display the last saved dump. +.PD 0 +.P +Devlink health saves a single dump per reporter. If an dump is +.P +not already stored by the Devlink, this command will generate a new +.P +dump. The dump can be generated either automatically when a +.P +reporter reports on an error or manually at the user's request. +.PD +.PP +.I "DEV" +- specifies the devlink device. +.PP +.I "REPORTER" +- specifies the reporter's name registered on the devlink device. +.SS devlink health dump clear - Delete the saved dump. +Deleting the saved dump enables a generation of a new dump on +.PD 0 +.P +the next "devlink health dump show" command. +.PD +.PP +.I "DEV" +- specifies the devlink device. +.PP +.I "REPORTER" +- specifies the reporter's name registered on the devlink device. +.SS devlink health set - Enable the user to configure: +.PD 0 +1) grace_period [msec] - Time interval between auto recoveries. +.P +2) auto_recover [true/false] - Indicates whether the devlink should execute automatic recover on error. +.PD +.PP +.I "DEV" +- specifies the devlink device. +.PP +.I "REPORTER" +- specifies the reporter's name registered on the devlink device. +.SH "EXAMPLES" +.PP +devlink health show +.RS 4 +pci/0000:00:09.0: + name tx + state healthy #err 1 #recover 1 last_dump_ts N/A + parameters: + grace period 600 auto_recover true +.RE +.PP +devlink health recover pci/0000:00:09.0 reporter tx +.RS 4 +Initiate recovery on tx reporter registered on pci/0000:00:09.0. +.RE +.PP +devlink health diagnose pci/0000:00:09.0 reporter tx +.RS 4 +.PD 0 +SQs: +.P +sqn: 4283 HW state: 1 stopped: 0 +.P +sqn: 4288 HW state: 1 stopped: 0 +.P +sqn: 4293 HW state: 1 stopped: 0 +.P +sqn: 4298 HW state: 1 stopped: 0 +.PD +.RE +.PP +devlink health dump show pci/0000:00:09.0 reporter tx +.RS 4 +Display the last saved dump on tx reporter registered on pci/0000:00:09.0. +.RE +.PP +devlink health dump clear pci/0000:00:09.0 reporter tx +.RS 4 +Delete saved dump on tx reporter registered on pci/0000:00:09.0. +.RE +.PP +devlink health set pci/0000:00:09.0 reporter tx grace_period 3500 +.RS 4 +Set time interval between auto recoveries to minimum of 3500 mSec on +tx reporter registered on pci/0000:00:09.0. +.RE +.PP +devlink health set pci/0000:00:09.0 reporter tx auto_recover false +.RS 4 +Turn off auto recovery on tx reporter registered on pci/0000:00:09.0. +.RE +.SH SEE ALSO +.BR devlink (8), +.BR devlink-dev (8), +.BR devlink-port (8), +.BR devlink-region (8), +.br + +.SH AUTHOR +Aya Levin diff --git a/man/man8/devlink.8 b/man/man8/devlink.8 index 8d527e7e1d60..13d4dcd908b3 100644 --- a/man/man8/devlink.8 +++ b/man/man8/devlink.8 @@ -7,7 +7,7 @@ devlink \- Devlink tool .in +8 .ti -8 .B devlink -.RI "[ " OPTIONS " ] { " dev | port | monitor | sb | resource | region " } { " COMMAND " | " +.RI "[ " OPTIONS " ] { " dev | port | monitor | sb | resource | region | health " } { " COMMAND " | " .BR help " }" .sp @@ -78,6 +78,10 @@ Turn on verbose output. .B region - devlink address region access +.TP +.B health +- devlink reporting and recovery + .SS .I COMMAND @@ -109,6 +113,7 @@ Exit status is 0 if command was successful or a positive integer upon failure. .BR devlink-sb (8), .BR devlink-resource (8), .BR devlink-region (8), +.BR devlink-health (8), .br .SH REPORTING BUGS -- 2.14.1