All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/1] nvme: Add verbose error logging
@ 2021-12-09 17:32 Alan Adamson
  2021-12-09 17:32 ` [PATCH 1/1] " Alan Adamson
  0 siblings, 1 reply; 11+ messages in thread
From: Alan Adamson @ 2021-12-09 17:32 UTC (permalink / raw)
  To: linux-nvme; +Cc: alan.adamson, kbusch, hch, sagi


This patch improves logging for NVMe errors.  Currently, we only get a
a vague idea as to why commands fail since only the block layer status
is captured on error. This patch allows us to see why a command was failed
by the controller. This is very useful when debugging problems in the field.

An example of an imnproved logged error:

[ 4652.812867] nvme0n1: Read @ LBA 1536, 1 blocks, Unrecovered Read Error (sct 0x2 / sc 0x81) DNR 

Alan Adamson (1):
  nvme: Add verbose error logging

 drivers/nvme/host/Kconfig  |   8 ++
 drivers/nvme/host/Makefile |   1 +
 drivers/nvme/host/core.c   |   4 +
 drivers/nvme/host/errors.c | 176 +++++++++++++++++++++++++++++++++++++
 drivers/nvme/host/nvme.h   |   6 ++
 include/linux/nvme.h       |   1 +
 6 files changed, 196 insertions(+)
 create mode 100644 drivers/nvme/host/errors.c

-- 
2.27.0



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/1] nvme: Add verbose error logging
  2021-12-09 17:32 [PATCH 0/1] nvme: Add verbose error logging Alan Adamson
@ 2021-12-09 17:32 ` Alan Adamson
  2021-12-10 11:36   ` Chaitanya Kulkarni
  0 siblings, 1 reply; 11+ messages in thread
From: Alan Adamson @ 2021-12-09 17:32 UTC (permalink / raw)
  To: linux-nvme; +Cc: alan.adamson, kbusch, hch, sagi

Enable more verbose reporting of NVMe errors. The kernel needs to be compiled
with the NVME_VERBOSE_ERRORS kernel config option enabled.

Logged error example:

nvme0n1: Read @ LBA 2304, 1 blocks, Unrecovered Read Error (sct 0x2 / sc 0x81) DNR 

Signed-off-by: Alan Adamson <alan.adamson@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
---
 drivers/nvme/host/Kconfig  |   8 ++
 drivers/nvme/host/Makefile |   1 +
 drivers/nvme/host/core.c   |   4 +
 drivers/nvme/host/errors.c | 176 +++++++++++++++++++++++++++++++++++++
 drivers/nvme/host/nvme.h   |   6 ++
 include/linux/nvme.h       |   1 +
 6 files changed, 196 insertions(+)
 create mode 100644 drivers/nvme/host/errors.c

diff --git a/drivers/nvme/host/Kconfig b/drivers/nvme/host/Kconfig
index dc0450ca23a3..d6d056963c06 100644
--- a/drivers/nvme/host/Kconfig
+++ b/drivers/nvme/host/Kconfig
@@ -24,6 +24,14 @@ config NVME_MULTIPATH
 	   /dev/nvmeXnY device will show up for each NVMe namespace,
 	   even if it is accessible through multiple controllers.
 
+config NVME_VERBOSE_ERRORS
+	bool "NVMe verbose error reporting"
+	depends on NVME_CORE
+	help
+	   This option enables verbose reporting for NVMe errors. The
+	   error translation table will grow the kernel image size by
+	   about 4 KB.
+
 config NVME_HWMON
 	bool "NVMe hardware monitoring"
 	depends on (NVME_CORE=y && HWMON=y) || (NVME_CORE=m && HWMON)
diff --git a/drivers/nvme/host/Makefile b/drivers/nvme/host/Makefile
index dfaacd472e5d..24fc888e3792 100644
--- a/drivers/nvme/host/Makefile
+++ b/drivers/nvme/host/Makefile
@@ -12,6 +12,7 @@ obj-$(CONFIG_NVME_TCP)			+= nvme-tcp.o
 nvme-core-y				:= core.o ioctl.o
 nvme-core-$(CONFIG_TRACING)		+= trace.o
 nvme-core-$(CONFIG_NVME_MULTIPATH)	+= multipath.o
+nvme-core-$(CONFIG_NVME_VERBOSE_ERRORS)	+= errors.o
 nvme-core-$(CONFIG_BLK_DEV_ZONED)	+= zns.o
 nvme-core-$(CONFIG_FAULT_INJECTION_DEBUG_FS)	+= fault_inject.o
 nvme-core-$(CONFIG_NVME_HWMON)		+= hwmon.o
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 4b5de8f5435a..1e0fb4375b35 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -346,6 +346,8 @@ static inline void nvme_end_req(struct request *req)
 
 void nvme_complete_rq(struct request *req)
 {
+	blk_status_t status = nvme_error_status(nvme_req(req)->status);
+
 	trace_nvme_complete_rq(req);
 	nvme_cleanup_cmd(req);
 
@@ -354,6 +356,8 @@ void nvme_complete_rq(struct request *req)
 
 	switch (nvme_decide_disposition(req)) {
 	case COMPLETE:
+		if (unlikely(status != BLK_STS_OK))
+			nvme_error_log(req);
 		nvme_end_req(req);
 		return;
 	case RETRY:
diff --git a/drivers/nvme/host/errors.c b/drivers/nvme/host/errors.c
new file mode 100644
index 000000000000..4a60e52f55b0
--- /dev/null
+++ b/drivers/nvme/host/errors.c
@@ -0,0 +1,176 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * NVM Express device driver verbose errors
+ * Copyright (c) 2021, Oracle and/or its affiliates
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+
+#include <linux/blkdev.h>
+#include "nvme.h"
+
+struct nvme_string_table {
+	unsigned int code;
+	const unsigned char *string;
+};
+
+static const struct nvme_string_table nvme_ops[] = {
+	{ REQ_OP_READ,			"Read" },
+	{ REQ_OP_WRITE,			"Write" },
+	{ REQ_OP_FLUSH,			"Flush" },
+	{ REQ_OP_DISCARD,		"Deallocate (DSM)" },
+	{ REQ_OP_WRITE_ZEROES,		"Write Zeroes" },
+	{ REQ_OP_ZONE_RESET_ALL,	"Zone Reset All" },
+	{ REQ_OP_ZONE_RESET,		"Zone Reset" },
+	{ REQ_OP_ZONE_OPEN,		"Zone Open" },
+	{ REQ_OP_ZONE_CLOSE,		"Zone Close" },
+	{ REQ_OP_ZONE_FINISH,		"Zone Finish" },
+	{ REQ_OP_ZONE_APPEND,		"Zone Append" },
+};
+#define NVME_OPS_SIZE ARRAY_SIZE(nvme_ops)
+
+static const struct nvme_string_table nvme_errors[] = {
+	{ NVME_SC_SUCCESS,		"Success" },
+	{ NVME_SC_INVALID_OPCODE,	"Invalid Command Opcode" },
+	{ NVME_SC_INVALID_FIELD,	"Invalid Field in Command" },
+	{ NVME_SC_CMDID_CONFLICT,	"Command ID Conflict" },
+	{ NVME_SC_DATA_XFER_ERROR,	"Data Transfer Error" },
+	{ NVME_SC_POWER_LOSS,		"Commands Aborted due to Power Loss Notification" },
+	{ NVME_SC_INTERNAL,		"Internal Error" },
+	{ NVME_SC_ABORT_REQ,		"Command Abort Requested" },
+	{ NVME_SC_ABORT_QUEUE,		"Command Aborted due to SQ Deletion" },
+	{ NVME_SC_FUSED_FAIL,		"Command Aborted due to Failed Fused Command" },
+	{ NVME_SC_FUSED_MISSING,	"Command Aborted due to Missing Fused Command" },
+	{ NVME_SC_INVALID_NS,		"Invalid Namespace or Format" },
+	{ NVME_SC_CMD_SEQ_ERROR,	"Command Sequence Error" },
+	{ NVME_SC_SGL_INVALID_LAST,	"Invalid SGL Segment Descriptor" },
+	{ NVME_SC_SGL_INVALID_COUNT,	"Invalid Number of SGL Descriptors" },
+	{ NVME_SC_SGL_INVALID_DATA,	"Data SGL Length Invalid" },
+	{ NVME_SC_SGL_INVALID_METADATA,	"Metadata SGL Length Invalid" },
+	{ NVME_SC_SGL_INVALID_TYPE,	"SGL Descriptor Type Invalid" },
+	{ NVME_SC_CMB_INVALID_USE,	"Invalid Use of Controller Memory Buffer" },
+	{ NVME_SC_PRP_INVALID_OFFSET,	"PRP Offset Invalid" },
+	{ NVME_SC_ATOMIC_WU_EXCEEDED,	"Atomic Write Unit Exceeded" },
+	{ NVME_SC_OP_DENIED,		"Operation Denied" },
+	{ NVME_SC_SGL_INVALID_OFFSET,	"SGL Offset Invalid" },
+	{ NVME_SC_RESERVED,		"Reserved" },
+	{ NVME_SC_HOST_ID_INCONSIST,	"Host Identifier Inconsistent Format" },
+	{ NVME_SC_KA_TIMEOUT_EXPIRED,	"Keep Alive Timeout Expired" },
+	{ NVME_SC_KA_TIMEOUT_INVALID,	"Keep Alive Timeout Invalid" },
+	{ NVME_SC_ABORTED_PREEMPT_ABORT,	"Command Aborted due to Preempt and Abort" },
+	{ NVME_SC_SANITIZE_FAILED,	"Sanitize Failed" },
+	{ NVME_SC_SANITIZE_IN_PROGRESS,	"Sanitize In Progress" },
+	{ NVME_SC_SGL_INVALID_GRANULARITY,	"SGL Data Block Granularity Invalid" },
+	{ NVME_SC_CMD_NOT_SUP_CMB_QUEUE,	"Command Not Supported for Queue in CMB" },
+	{ NVME_SC_NS_WRITE_PROTECTED,	"Namespace is Write Protected" },
+	{ NVME_SC_CMD_INTERRUPTED,	"Command Interrupted" },
+	{ NVME_SC_TRANSIENT_TR_ERR,	"Transient Transport Error" },
+	{ NVME_SC_LBA_RANGE,		"LBA Out of Range" },
+	{ NVME_SC_CAP_EXCEEDED,		"Capacity Exceeded" },
+	{ NVME_SC_NS_NOT_READY,		"Namespace Not Ready" },
+	{ NVME_SC_RESERVATION_CONFLICT,	"Reservation Conflict" },
+	{ NVME_SC_FORMAT_IN_PROGRESS,	"Format In Progress" },
+	{ NVME_SC_CQ_INVALID,		"Completion Queue Invalid" },
+	{ NVME_SC_QID_INVALID,		"Invalid Queue Identifier" },
+	{ NVME_SC_QUEUE_SIZE,		"Invalid Queue Size" },
+	{ NVME_SC_ABORT_LIMIT,		"Abort Command Limit Exceeded" },
+	{ NVME_SC_ABORT_MISSING,	"Reserved" },
+	{ NVME_SC_ASYNC_LIMIT,		"Asynchronous Event Request Limit Exceeded" },
+	{ NVME_SC_FIRMWARE_SLOT,	"Invalid Firmware Slot" },
+	{ NVME_SC_FIRMWARE_IMAGE,	"Invalid Firmware Image" },
+	{ NVME_SC_INVALID_VECTOR,	"Invalid Interrupt Vector" },
+	{ NVME_SC_INVALID_LOG_PAGE,	"Invalid Log Page" },
+	{ NVME_SC_INVALID_FORMAT,	"Invalid Format" },
+	{ NVME_SC_FW_NEEDS_CONV_RESET,	"Firmware Activation Requires Conventional Reset" },
+	{ NVME_SC_INVALID_QUEUE,	"Invalid Queue Deletion" },
+	{ NVME_SC_FEATURE_NOT_SAVEABLE,	"Feature Identifier Not Saveable" },
+	{ NVME_SC_FEATURE_NOT_CHANGEABLE,	"Feature Not Changeable" },
+	{ NVME_SC_FEATURE_NOT_PER_NS,	"Feature Not Namespace Specific" },
+	{ NVME_SC_FW_NEEDS_SUBSYS_RESET, "Firmware Activation Requires NVM Subsystem Reset" },
+	{ NVME_SC_FW_NEEDS_RESET,	"Firmware Activation Requires Reset" },
+	{ NVME_SC_FW_NEEDS_MAX_TIME,	"Firmware Activation Requires Maximum Time Violation" },
+	{ NVME_SC_FW_ACTIVATE_PROHIBITED,	"Firmware Activation Prohibited" },
+	{ NVME_SC_OVERLAPPING_RANGE,	"Overlapping Range" },
+	{ NVME_SC_NS_INSUFFICIENT_CAP,	"Namespace Insufficient Capacity" },
+	{ NVME_SC_NS_ID_UNAVAILABLE,	"Namespace Identifier Unavailable" },
+	{ NVME_SC_NS_ALREADY_ATTACHED,	"Namespace Already Attached" },
+	{ NVME_SC_NS_IS_PRIVATE,	"Namespace Is Private" },
+	{ NVME_SC_NS_NOT_ATTACHED,	"Namespace Not Attached" },
+	{ NVME_SC_THIN_PROV_NOT_SUPP,	"Thin Provisioning Not Supported" },
+	{ NVME_SC_CTRL_LIST_INVALID,	"Controller List Invalid" },
+	{ NVME_SC_SELT_TEST_IN_PROGRESS,	"Device Self-test In Progress" },
+	{ NVME_SC_BP_WRITE_PROHIBITED,	"Boot Partition Write Prohibited" },
+	{ NVME_SC_CTRL_ID_INVALID,	"Invalid Controller Identifier" },
+	{ NVME_SC_SEC_CTRL_STATE_INVALID,	"Invalid Secondary Controller State" },
+	{ NVME_SC_CTRL_RES_NUM_INVALID,	"Invalid Number of Controller Resources" },
+	{ NVME_SC_RES_ID_INVALID,	"Invalid Resource Identifier" },
+	{ NVME_SC_PMR_SAN_PROHIBITED,	"Sanitize Prohibited" },
+	{ NVME_SC_ANA_GROUP_ID_INVALID,	"ANA Group Identifier Invalid" },
+	{ NVME_SC_ANA_ATTACH_FAILED,	"ANA Attach Failed" },
+	{ NVME_SC_BAD_ATTRIBUTES,	"Conflicting Attributes" },
+	{ NVME_SC_INVALID_PI,		"Invalid Protection Information" },
+	{ NVME_SC_READ_ONLY,		"Attempted Write to Read Only Range" },
+	{ NVME_SC_ONCS_NOT_SUPPORTED,	"ONCS Not Supported" },
+	{ NVME_SC_ZONE_BOUNDARY_ERROR,	"Zoned Boundary Error" },
+	{ NVME_SC_ZONE_FULL,		"Zone Is Full" },
+	{ NVME_SC_ZONE_READ_ONLY,	"Zone Is Read Only" },
+	{ NVME_SC_ZONE_OFFLINE,		"Zone Is Offline" },
+	{ NVME_SC_ZONE_INVALID_WRITE,	"Zone Invalid Write" },
+	{ NVME_SC_ZONE_TOO_MANY_ACTIVE,	"Too Many Active Zones" },
+	{ NVME_SC_ZONE_TOO_MANY_OPEN,	"Too Many Open Zones" },
+	{ NVME_SC_ZONE_INVALID_TRANSITION,	"Invalid Zone State Transition" },
+	{ NVME_SC_WRITE_FAULT,		"Write Fault" },
+	{ NVME_SC_READ_ERROR,		"Unrecovered Read Error" },
+	{ NVME_SC_GUARD_CHECK,		"End-to-end Guard Check Error" },
+	{ NVME_SC_APPTAG_CHECK,		"End-to-end Application Tag Check Error" },
+	{ NVME_SC_REFTAG_CHECK,		"End-to-end Reference Tag Check Error" },
+	{ NVME_SC_COMPARE_FAILED,	"Compare Failure" },
+	{ NVME_SC_ACCESS_DENIED,	"Access Denied" },
+	{ NVME_SC_UNWRITTEN_BLOCK,	"Deallocated or Unwritten Logical Block" },
+	{ NVME_SC_ANA_PERSISTENT_LOSS,	"Asymmetric Access Persistent Loss" },
+	{ NVME_SC_ANA_INACCESSIBLE,	"Asymmetric Access Inaccessible" },
+	{ NVME_SC_ANA_TRANSITION,	"Asymmetric Access Transition" },
+	{ NVME_SC_HOST_PATH_ERROR,	"Host Pathing Error" },
+	{ NVME_SC_HOST_ABORTED_CMD,	"Command Aborted By Host:" },
+};
+#define NVME_ERRORS_SIZE ARRAY_SIZE(nvme_errors)
+
+void nvme_error_log(struct request *req)
+{
+	struct nvme_ns *ns = req->q->queuedata;
+	struct nvme_request *nr = nvme_req(req);
+	const struct nvme_string_table *entry;
+	const unsigned char *op_str  = "Unknown";
+	const unsigned char *err_str = "Unknown";
+	unsigned int i;
+
+	if (!ns)
+		return;
+
+	for (i = 0, entry = nvme_ops ; i < NVME_OPS_SIZE ; i++)
+		if (entry[i].code == (req->cmd_flags & REQ_OP_MASK))
+			op_str = entry[i].string;
+
+	for (i = 0, entry = nvme_errors ; i < NVME_ERRORS_SIZE ; i++)
+		if (entry[i].code == (nr->status & 0x7ff))
+			err_str = entry[i].string;
+
+	pr_err("%s: %s @ LBA %llu, %llu blocks, %s (sct 0x%x / sc 0x%x) %s%s\n",
+	       req->rq_disk ? req->rq_disk->disk_name : "?",
+	       op_str,
+	       (unsigned long long)nvme_sect_to_lba(ns, blk_rq_pos(req)),
+	       (unsigned long long)blk_rq_bytes(req) >> ns->lba_shift,
+	       err_str,
+	       nr->status >> 8 & 7, /* Status Code Type */
+	       nr->status & 0xff,    /* Status Code */
+	       nr->status & NVME_SC_MORE ? "MORE " : "",
+	       nr->status & NVME_SC_DNR  ? "DNR "  : "");
+}
+EXPORT_SYMBOL_GPL(nvme_error_log);
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index b334af8aa264..b994804978c4 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -922,4 +922,10 @@ static inline bool nvme_multi_css(struct nvme_ctrl *ctrl)
 	return (ctrl->ctrl_config & NVME_CC_CSS_MASK) == NVME_CC_CSS_CSI;
 }
 
+#ifdef CONFIG_NVME_VERBOSE_ERRORS
+extern void nvme_error_log(struct request *req);
+#else
+static inline void nvme_error_log(struct request *req) {}
+#endif  /* CONFIG_NVME_VERBOSE_ERRORS */
+
 #endif /* _NVME_H */
diff --git a/include/linux/nvme.h b/include/linux/nvme.h
index 855dd9b3e84b..1f946e5bf7c1 100644
--- a/include/linux/nvme.h
+++ b/include/linux/nvme.h
@@ -1636,6 +1636,7 @@ enum {
 	NVME_SC_HOST_ABORTED_CMD	= 0x371,
 
 	NVME_SC_CRD			= 0x1800,
+	NVME_SC_MORE			= 0x2000,
 	NVME_SC_DNR			= 0x4000,
 };
 
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/1] nvme: Add verbose error logging
  2021-12-09 17:32 ` [PATCH 1/1] " Alan Adamson
@ 2021-12-10 11:36   ` Chaitanya Kulkarni
  2021-12-10 15:54     ` Martin K. Petersen
  0 siblings, 1 reply; 11+ messages in thread
From: Chaitanya Kulkarni @ 2021-12-10 11:36 UTC (permalink / raw)
  To: Alan Adamson; +Cc: kbusch, hch, sagi, linux-nvme

Alan,

On 12/9/21 9:32 AM, Alan Adamson wrote:
> Enable more verbose reporting of NVMe errors. The kernel needs to be compiled
> with the NVME_VERBOSE_ERRORS kernel config option enabled.
> 
> Logged error example:
> 
> nvme0n1: Read @ LBA 2304, 1 blocks, Unrecovered Read Error (sct 0x2 / sc 0x81) DNR
> 
> Signed-off-by: Alan Adamson <alan.adamson@oracle.com>
> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
> ---

Although I always prefer descriptive error messages, I've not seen even
half of these errors in the field. This will :-

1. Increase the size of the driver code.
2. Require constant patching of the driver code when spec is updated,
    I'm afraid this will turn into churn.

Also, we can easily track down the error in the NVMe spec from the error
code that driver is printing currently, is that not sufficient ?
if not please provide a scenario, we can always improve bare minimum
error logging.

-ck

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/1] nvme: Add verbose error logging
  2021-12-10 11:36   ` Chaitanya Kulkarni
@ 2021-12-10 15:54     ` Martin K. Petersen
  2021-12-10 16:56       ` Keith Busch
  0 siblings, 1 reply; 11+ messages in thread
From: Martin K. Petersen @ 2021-12-10 15:54 UTC (permalink / raw)
  To: Chaitanya Kulkarni; +Cc: Alan Adamson, kbusch, hch, sagi, linux-nvme


Hi Chaitanya!

> Although I always prefer descriptive error messages, I've not seen even
> half of these errors in the field. This will :-
>
> 1. Increase the size of the driver code.
> 2. Require constant patching of the driver code when spec is updated,
>    I'm afraid this will turn into churn.

Updating the corresponding list in SCSI hasn't been a big problem.

> Also, we can easily track down the error in the NVMe spec from the
> error code that driver is printing currently, is that not sufficient ?

Only the block layer status is captured.

In addition, customers are not really armed with the NVMe error decoder
ring and are left with no explanation as to why an I/O failed. We have
had good results in the field with the verbose SCSI error logging and
that's why this patch was modeled to work the same way.

-- 
Martin K. Petersen	Oracle Linux Engineering


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/1] nvme: Add verbose error logging
  2021-12-10 15:54     ` Martin K. Petersen
@ 2021-12-10 16:56       ` Keith Busch
  2021-12-12  9:52         ` Sagi Grimberg
                           ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Keith Busch @ 2021-12-10 16:56 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Chaitanya Kulkarni, Alan Adamson, hch, sagi, linux-nvme

On Fri, Dec 10, 2021 at 10:54:45AM -0500, Martin K. Petersen wrote:
> 
> Hi Chaitanya!
> 
> > Although I always prefer descriptive error messages, I've not seen even
> > half of these errors in the field. This will :-
> >
> > 1. Increase the size of the driver code.
> > 2. Require constant patching of the driver code when spec is updated,
> >    I'm afraid this will turn into churn.
> 
> Updating the corresponding list in SCSI hasn't been a big problem.
> 
> > Also, we can easily track down the error in the NVMe spec from the
> > error code that driver is printing currently, is that not sufficient ?
> 
> Only the block layer status is captured.
> 
> In addition, customers are not really armed with the NVMe error decoder
> ring and are left with no explanation as to why an I/O failed. We have
> had good results in the field with the verbose SCSI error logging and
> that's why this patch was modeled to work the same way.

If they are not familiar with the decoder, does seeing something like
"Conflicting Attributes" really provide the user a better sense than
"0x280"? Either way, it sounds like they'd just open a bug with someone
who is familiar with the spec, in which case numeric codes should be
sufficient.

Are the existing nvme trace events not providing enough information? The
existing ones appear to have even greater detail than what this patch
provides. Or is there some other reason why the kernel log is preferred
over the event tracer for capturing these types of errors?

On the patch itself:

I'm not sure having it be a new Kconfig option is a good idea since it
requires a recompile to enable/disable. Could this be toggled with a
static_key instead so we may disable/enable verbose logging at runtime?

The new "struct nvme_string_table" is a bit inefficient. Unknown errors
will scan the entire table, which will slow down our hot completion path
on any error. The codes are small and mostly sequential, so I think you
could create a sparse "static const char * const[]", then the "code"
becomes the index that can be looked up in constant time. The total
space of the array shouldn't be much different than the proposal.

A significant amount of the new strings are specific to admin commands,
yet errors for admin commands skip the logging. Why define unreachable
strings? Or perhaps a better question: why skip logging admin errors?

IO errors can happen in large batches, so perhaps pr_err() should be
ratelimted.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/1] nvme: Add verbose error logging
  2021-12-10 16:56       ` Keith Busch
@ 2021-12-12  9:52         ` Sagi Grimberg
  2021-12-13  8:09           ` Chaitanya Kulkarni
  2021-12-14  0:31           ` Alan Adamson
  2021-12-13  8:12         ` hch
  2021-12-14  3:01         ` Martin K. Petersen
  2 siblings, 2 replies; 11+ messages in thread
From: Sagi Grimberg @ 2021-12-12  9:52 UTC (permalink / raw)
  To: Keith Busch, Martin K. Petersen
  Cc: Chaitanya Kulkarni, Alan Adamson, hch, linux-nvme



On 12/10/21 6:56 PM, Keith Busch wrote:
> On Fri, Dec 10, 2021 at 10:54:45AM -0500, Martin K. Petersen wrote:
>>
>> Hi Chaitanya!
>>
>>> Although I always prefer descriptive error messages, I've not seen even
>>> half of these errors in the field. This will :-
>>>
>>> 1. Increase the size of the driver code.
>>> 2. Require constant patching of the driver code when spec is updated,
>>>     I'm afraid this will turn into churn.
>>
>> Updating the corresponding list in SCSI hasn't been a big problem.
>>
>>> Also, we can easily track down the error in the NVMe spec from the
>>> error code that driver is printing currently, is that not sufficient ?
>>
>> Only the block layer status is captured.
>>
>> In addition, customers are not really armed with the NVMe error decoder
>> ring and are left with no explanation as to why an I/O failed. We have
>> had good results in the field with the verbose SCSI error logging and
>> that's why this patch was modeled to work the same way.
> 
> If they are not familiar with the decoder, does seeing something like
> "Conflicting Attributes" really provide the user a better sense than
> "0x280"? Either way, it sounds like they'd just open a bug with someone
> who is familiar with the spec, in which case numeric codes should be
> sufficient.
> 
> Are the existing nvme trace events not providing enough information? The
> existing ones appear to have even greater detail than what this patch
> provides. Or is there some other reason why the kernel log is preferred
> over the event tracer for capturing these types of errors?

Maybe the string info be added to the trace events? If so, we may
need a verbosity level for the trace to record "only errors".

Although the one advantage for the log would be the first time error
status (or for generally rare errors).

> On the patch itself:
> 
> I'm not sure having it be a new Kconfig option is a good idea since it
> requires a recompile to enable/disable. Could this be toggled with a
> static_key instead so we may disable/enable verbose logging at runtime?
> 
> The new "struct nvme_string_table" is a bit inefficient. Unknown errors
> will scan the entire table, which will slow down our hot completion path
> on any error. The codes are small and mostly sequential, so I think you
> could create a sparse "static const char * const[]", then the "code"
> becomes the index that can be looked up in constant time. The total
> space of the array shouldn't be much different than the proposal.

100% agree.

> A significant amount of the new strings are specific to admin commands,
> yet errors for admin commands skip the logging. Why define unreachable
> strings? Or perhaps a better question: why skip logging admin errors?
> 
> IO errors can happen in large batches, so perhaps pr_err() should be
> ratelimted.

Agree.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/1] nvme: Add verbose error logging
  2021-12-12  9:52         ` Sagi Grimberg
@ 2021-12-13  8:09           ` Chaitanya Kulkarni
  2021-12-14  0:31           ` Alan Adamson
  1 sibling, 0 replies; 11+ messages in thread
From: Chaitanya Kulkarni @ 2021-12-13  8:09 UTC (permalink / raw)
  To: Alan Adamson
  Cc: hch, linux-nvme, Sagi Grimberg, Keith Busch, Martin K. Petersen

On 12/12/21 1:52 AM, Sagi Grimberg wrote:
> External email: Use caution opening links or attachments
> 
> 
> On 12/10/21 6:56 PM, Keith Busch wrote:
>> On Fri, Dec 10, 2021 at 10:54:45AM -0500, Martin K. Petersen wrote:
>>>
>>> Hi Chaitanya!
>>>
>>>> Although I always prefer descriptive error messages, I've not seen even
>>>> half of these errors in the field. This will :-
>>>>
>>>> 1. Increase the size of the driver code.
>>>> 2. Require constant patching of the driver code when spec is updated,
>>>>     I'm afraid this will turn into churn.
>>>
>>> Updating the corresponding list in SCSI hasn't been a big problem.
>>>
>>>> Also, we can easily track down the error in the NVMe spec from the
>>>> error code that driver is printing currently, is that not sufficient ?
>>>
>>> Only the block layer status is captured.
>>>
>>> In addition, customers are not really armed with the NVMe error decoder
>>> ring and are left with no explanation as to why an I/O failed. We have
>>> had good results in the field with the verbose SCSI error logging and
>>> that's why this patch was modeled to work the same way.
>>
>> If they are not familiar with the decoder, does seeing something like
>> "Conflicting Attributes" really provide the user a better sense than
>> "0x280"? Either way, it sounds like they'd just open a bug with someone
>> who is familiar with the spec, in which case numeric codes should be
>> sufficient.
>>
>> Are the existing nvme trace events not providing enough information? The
>> existing ones appear to have even greater detail than what this patch
>> provides. Or is there some other reason why the kernel log is preferred
>> over the event tracer for capturing these types of errors?
> 
> Maybe the string info be added to the trace events? If so, we may
> need a verbosity level for the trace to record "only errors".
> 
> Although the one advantage for the log would be the first time error
> status (or for generally rare errors).
> 
>> On the patch itself:
>>
>> I'm not sure having it be a new Kconfig option is a good idea since it
>> requires a recompile to enable/disable. Could this be toggled with a
>> static_key instead so we may disable/enable verbose logging at runtime?
>>
>> The new "struct nvme_string_table" is a bit inefficient. Unknown errors
>> will scan the entire table, which will slow down our hot completion path
>> on any error. The codes are small and mostly sequential, so I think you
>> could create a sparse "static const char * const[]", then the "code"
>> becomes the index that can be looked up in constant time. The total
>> space of the array shouldn't be much different than the proposal.
> 
> 100% agree.
> 
>

Ideally we should document the performance numbers if we are going
to add anything in the fast path.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/1] nvme: Add verbose error logging
  2021-12-10 16:56       ` Keith Busch
  2021-12-12  9:52         ` Sagi Grimberg
@ 2021-12-13  8:12         ` hch
  2021-12-13  9:28           ` Sagi Grimberg
  2021-12-14  3:01         ` Martin K. Petersen
  2 siblings, 1 reply; 11+ messages in thread
From: hch @ 2021-12-13  8:12 UTC (permalink / raw)
  To: Keith Busch
  Cc: Martin K. Petersen, Chaitanya Kulkarni, Alan Adamson, hch, sagi,
	linux-nvme

On Fri, Dec 10, 2021 at 08:56:48AM -0800, Keith Busch wrote:
> I'm not sure having it be a new Kconfig option is a good idea since it
> requires a recompile to enable/disable. Could this be toggled with a
> static_key instead so we may disable/enable verbose logging at runtime?

We need to think about module size.  NVMe is not only used in big
enterprise setups but also in deeply embedded scenarious these days.

I'm also not really convince of the usefulness (but also not strictly
againt it), but if we support it it needs to be conditional.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/1] nvme: Add verbose error logging
  2021-12-13  8:12         ` hch
@ 2021-12-13  9:28           ` Sagi Grimberg
  0 siblings, 0 replies; 11+ messages in thread
From: Sagi Grimberg @ 2021-12-13  9:28 UTC (permalink / raw)
  To: hch, Keith Busch
  Cc: Martin K. Petersen, Chaitanya Kulkarni, Alan Adamson, linux-nvme

> On Fri, Dec 10, 2021 at 08:56:48AM -0800, Keith Busch wrote:
>> I'm not sure having it be a new Kconfig option is a good idea since it
>> requires a recompile to enable/disable. Could this be toggled with a
>> static_key instead so we may disable/enable verbose logging at runtime?
> 
> We need to think about module size.  NVMe is not only used in big
> enterprise setups but also in deeply embedded scenarious these days.
> 
> I'm also not really convince of the usefulness (but also not strictly
> againt it), but if we support it it needs to be conditional.

Agree, the submitted patch made it conditional though.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/1] nvme: Add verbose error logging
  2021-12-12  9:52         ` Sagi Grimberg
  2021-12-13  8:09           ` Chaitanya Kulkarni
@ 2021-12-14  0:31           ` Alan Adamson
  1 sibling, 0 replies; 11+ messages in thread
From: Alan Adamson @ 2021-12-14  0:31 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Keith Busch, Martin Petersen, Chaitanya Kulkarni, hch, linux-nvme


The main purpose of the patch is to get a bit more info into the logs on customer production
systems.   

What if we drop the string and just display the status, DNR and MORE bits:

[   56.058003] nvme0n1: Read @ LBA 2048, 1 blocks, Unrecovered Read Error (sct 0x2 / sc 0x81) DNR MORE
[   56.058075] critical medium error, dev nvme0n1, sector 2048 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0

This would be extremely useful when agoing through customer logs.


Alan


> On Dec 12, 2021, at 1:52 AM, Sagi Grimberg <sagi@grimberg.me> wrote:
> 
> 
> 
> On 12/10/21 6:56 PM, Keith Busch wrote:
>> On Fri, Dec 10, 2021 at 10:54:45AM -0500, Martin K. Petersen wrote:
>>> 
>>> Hi Chaitanya!
>>> 
>>>> Although I always prefer descriptive error messages, I've not seen even
>>>> half of these errors in the field. This will :-
>>>> 
>>>> 1. Increase the size of the driver code.
>>>> 2. Require constant patching of the driver code when spec is updated,
>>>>    I'm afraid this will turn into churn.
>>> 
>>> Updating the corresponding list in SCSI hasn't been a big problem.
>>> 
>>>> Also, we can easily track down the error in the NVMe spec from the
>>>> error code that driver is printing currently, is that not sufficient ?
>>> 
>>> Only the block layer status is captured.
>>> 
>>> In addition, customers are not really armed with the NVMe error decoder
>>> ring and are left with no explanation as to why an I/O failed. We have
>>> had good results in the field with the verbose SCSI error logging and
>>> that's why this patch was modeled to work the same way.
>> If they are not familiar with the decoder, does seeing something like
>> "Conflicting Attributes" really provide the user a better sense than
>> "0x280"? Either way, it sounds like they'd just open a bug with someone
>> who is familiar with the spec, in which case numeric codes should be
>> sufficient.
>> Are the existing nvme trace events not providing enough information? The
>> existing ones appear to have even greater detail than what this patch
>> provides. Or is there some other reason why the kernel log is preferred
>> over the event tracer for capturing these types of errors?
> 
> Maybe the string info be added to the trace events? If so, we may
> need a verbosity level for the trace to record "only errors".
> 
> Although the one advantage for the log would be the first time error
> status (or for generally rare errors).
> 
>> On the patch itself:
>> I'm not sure having it be a new Kconfig option is a good idea since it
>> requires a recompile to enable/disable. Could this be toggled with a
>> static_key instead so we may disable/enable verbose logging at runtime?
>> The new "struct nvme_string_table" is a bit inefficient. Unknown errors
>> will scan the entire table, which will slow down our hot completion path
>> on any error. The codes are small and mostly sequential, so I think you
>> could create a sparse "static const char * const[]", then the "code"
>> becomes the index that can be looked up in constant time. The total
>> space of the array shouldn't be much different than the proposal.
> 
> 100% agree.
> 
>> A significant amount of the new strings are specific to admin commands,
>> yet errors for admin commands skip the logging. Why define unreachable
>> strings? Or perhaps a better question: why skip logging admin errors?
>> IO errors can happen in large batches, so perhaps pr_err() should be
>> ratelimted.
> 
> Agree.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/1] nvme: Add verbose error logging
  2021-12-10 16:56       ` Keith Busch
  2021-12-12  9:52         ` Sagi Grimberg
  2021-12-13  8:12         ` hch
@ 2021-12-14  3:01         ` Martin K. Petersen
  2 siblings, 0 replies; 11+ messages in thread
From: Martin K. Petersen @ 2021-12-14  3:01 UTC (permalink / raw)
  To: Keith Busch
  Cc: Martin K. Petersen, Chaitanya Kulkarni, Alan Adamson, hch, sagi,
	linux-nvme


Keith,

> If they are not familiar with the decoder, does seeing something like
> "Conflicting Attributes" really provide the user a better sense than
> "0x280"? Either way, it sounds like they'd just open a bug with someone
> who is familiar with the spec, in which case numeric codes should be
> sufficient.

An English error string is very valuable in a support situation. It is
much easier to discuss the implications of "Unrecoverable Read Error"
than some random hex number.

> Are the existing nvme trace events not providing enough information?
> The existing ones appear to have even greater detail than what this
> patch provides. Or is there some other reason why the kernel log is
> preferred over the event tracer for capturing these types of errors?

Logging is imperative. We have to be able to root cause an issue after
the fact.

> I'm not sure having it be a new Kconfig option is a good idea since it
> requires a recompile to enable/disable. Could this be toggled with a
> static_key instead so we may disable/enable verbose logging at
> runtime?

In SCSI it's a Kconfig option because the embedded folks do not want to
waste 8K. We did it the same way here. Enabling verbose logging probably
doesn't make much sense unless you're an enterprise distro.

> The new "struct nvme_string_table" is a bit inefficient. Unknown errors
> will scan the entire table, which will slow down our hot completion path
> on any error. The codes are small and mostly sequential, so I think you
> could create a sparse "static const char * const[]", then the "code"
> becomes the index that can be looked up in constant time. The total
> space of the array shouldn't be much different than the proposal.

We can look into that.

> A significant amount of the new strings are specific to admin
> commands, yet errors for admin commands skip the logging. Why define
> unreachable strings? Or perhaps a better question: why skip logging
> admin errors?

Good point.

> IO errors can happen in large batches, so perhaps pr_err() should be
> ratelimted.

Sure.

-- 
Martin K. Petersen	Oracle Linux Engineering


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-12-14  3:01 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-09 17:32 [PATCH 0/1] nvme: Add verbose error logging Alan Adamson
2021-12-09 17:32 ` [PATCH 1/1] " Alan Adamson
2021-12-10 11:36   ` Chaitanya Kulkarni
2021-12-10 15:54     ` Martin K. Petersen
2021-12-10 16:56       ` Keith Busch
2021-12-12  9:52         ` Sagi Grimberg
2021-12-13  8:09           ` Chaitanya Kulkarni
2021-12-14  0:31           ` Alan Adamson
2021-12-13  8:12         ` hch
2021-12-13  9:28           ` Sagi Grimberg
2021-12-14  3:01         ` Martin K. Petersen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.