linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/7] nvme-pci: support device coredump
@ 2019-05-07 16:58 Akinobu Mita
  2019-05-07 16:58 ` [PATCH v2 1/7] devcoredump: use memory_read_from_buffer Akinobu Mita
                   ` (7 more replies)
  0 siblings, 8 replies; 22+ messages in thread
From: Akinobu Mita @ 2019-05-07 16:58 UTC (permalink / raw)
  To: linux-nvme, linux-kernel
  Cc: Akinobu Mita, Johannes Berg, Keith Busch, Jens Axboe,
	Christoph Hellwig, Sagi Grimberg, Minwoo Im

This enables to capture snapshot of controller information via device
coredump machanism, and it helps diagnose and debug issues.

The nvme device coredump is triggered when command timeout occurs, and
creates the following coredump files.

 - regs: NVMe controller registers (00h to 4Fh)
 - sq<qid>: Submission queue
 - cq<qid>: Completion queue
 - telemetry-ctrl-log: Telemetry controller-initiated log (if available)
 - data: Empty

(I don't have the NVMe device that supports telemetry log page for now, so
capturing telemetry log is untested.)

The device coredump mechanism currently allows drivers to create only a
single coredump file, so this also provides a new function that allows
drivers to create several device coredump files in one crashed device.

* v2
- Add Reviewed-by tag.
- Add patch to fix typo in comment
- Remove unneeded braces.
- Allocate device_entry followed by an array of devcd_file elements.
- Add telemetry log page definisions
- Add facility to check log page attributes
- Exclude the doorbell registers from register dump.
- Save controller registers in a binary format instead of a text format.
- Create an empty 'data' file in the device coredump.
- Save telemetry controller-initiated log if available
- Make coredump procedure into two phases (before resetting controller and
  after resetting as soon as admin queue is available).

Akinobu Mita (7):
  devcoredump: use memory_read_from_buffer
  devcoredump: fix typo in comment
  devcoredump: allow to create several coredump files in one device
  nvme.h: add telemetry log page definisions
  nvme: add facility to check log page attributes
  nvme-pci: add device coredump support
  nvme-pci: trigger device coredump on command timeout

 drivers/base/devcoredump.c  | 168 ++++++++++------
 drivers/nvme/host/Kconfig   |   1 +
 drivers/nvme/host/core.c    |   2 +
 drivers/nvme/host/nvme.h    |   1 +
 drivers/nvme/host/pci.c     | 460 ++++++++++++++++++++++++++++++++++++++++++--
 include/linux/devcoredump.h |  33 ++++
 include/linux/nvme.h        |  25 +++
 7 files changed, 617 insertions(+), 73 deletions(-)

Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Minwoo Im <minwoo.im.dev@gmail.com>
-- 
2.7.4


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v2 1/7] devcoredump: use memory_read_from_buffer
  2019-05-07 16:58 [PATCH v2 0/7] nvme-pci: support device coredump Akinobu Mita
@ 2019-05-07 16:58 ` Akinobu Mita
  2019-05-07 16:58 ` [PATCH v2 2/7] devcoredump: fix typo in comment Akinobu Mita
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 22+ messages in thread
From: Akinobu Mita @ 2019-05-07 16:58 UTC (permalink / raw)
  To: linux-nvme, linux-kernel
  Cc: Akinobu Mita, Johannes Berg, Keith Busch, Jens Axboe,
	Christoph Hellwig, Sagi Grimberg, Minwoo Im

Use memory_read_from_buffer() to simplify devcd_readv().

Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
---
* v2
- Add Reviewed-by tag.

 drivers/base/devcoredump.c | 11 +----------
 1 file changed, 1 insertion(+), 10 deletions(-)

diff --git a/drivers/base/devcoredump.c b/drivers/base/devcoredump.c
index f1a3353..3c960a6 100644
--- a/drivers/base/devcoredump.c
+++ b/drivers/base/devcoredump.c
@@ -164,16 +164,7 @@ static struct class devcd_class = {
 static ssize_t devcd_readv(char *buffer, loff_t offset, size_t count,
 			   void *data, size_t datalen)
 {
-	if (offset > datalen)
-		return -EINVAL;
-
-	if (offset + count > datalen)
-		count = datalen - offset;
-
-	if (count)
-		memcpy(buffer, ((u8 *)data) + offset, count);
-
-	return count;
+	return memory_read_from_buffer(buffer, count, &offset, data, datalen);
 }
 
 static void devcd_freev(void *data)
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 2/7] devcoredump: fix typo in comment
  2019-05-07 16:58 [PATCH v2 0/7] nvme-pci: support device coredump Akinobu Mita
  2019-05-07 16:58 ` [PATCH v2 1/7] devcoredump: use memory_read_from_buffer Akinobu Mita
@ 2019-05-07 16:58 ` Akinobu Mita
  2019-05-07 16:58 ` [PATCH v2 3/7] devcoredump: allow to create several coredump files in one device Akinobu Mita
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 22+ messages in thread
From: Akinobu Mita @ 2019-05-07 16:58 UTC (permalink / raw)
  To: linux-nvme, linux-kernel
  Cc: Akinobu Mita, Johannes Berg, Keith Busch, Jens Axboe,
	Christoph Hellwig, Sagi Grimberg, Minwoo Im

s/dev_coredumpmsg/dev_coredumpsg/

Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Minwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
---
* v2
- New patch in this version.

 drivers/base/devcoredump.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/base/devcoredump.c b/drivers/base/devcoredump.c
index 3c960a6..e42d0b5 100644
--- a/drivers/base/devcoredump.c
+++ b/drivers/base/devcoredump.c
@@ -314,7 +314,7 @@ void dev_coredumpm(struct device *dev, struct module *owner,
 EXPORT_SYMBOL_GPL(dev_coredumpm);
 
 /**
- * dev_coredumpmsg - create device coredump that uses scatterlist as data
+ * dev_coredumpsg - create device coredump that uses scatterlist as data
  * parameter
  * @dev: the struct device for the crashed device
  * @table: the dump data
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 3/7] devcoredump: allow to create several coredump files in one device
  2019-05-07 16:58 [PATCH v2 0/7] nvme-pci: support device coredump Akinobu Mita
  2019-05-07 16:58 ` [PATCH v2 1/7] devcoredump: use memory_read_from_buffer Akinobu Mita
  2019-05-07 16:58 ` [PATCH v2 2/7] devcoredump: fix typo in comment Akinobu Mita
@ 2019-05-07 16:58 ` Akinobu Mita
  2019-05-07 17:35   ` Heitke, Kenneth
  2019-05-07 16:58 ` [PATCH v2 4/7] nvme.h: add telemetry log page definisions Akinobu Mita
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 22+ messages in thread
From: Akinobu Mita @ 2019-05-07 16:58 UTC (permalink / raw)
  To: linux-nvme, linux-kernel
  Cc: Akinobu Mita, Johannes Berg, Keith Busch, Jens Axboe,
	Christoph Hellwig, Sagi Grimberg, Minwoo Im

The device coredump mechanism currently allows drivers to create only a
single coredump file.  If there are several binary blobs to dump, we need
to define a binary format or conver to text format in order to put them
into a single coredump file.

This provides a new function that allows drivers to create several device
coredump files in one crashed device.

Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Minwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
---
* v2
- Remove unneeded braces.
- Allocate device_entry followed by an array of devcd_file elements.

 drivers/base/devcoredump.c  | 155 ++++++++++++++++++++++++++++++--------------
 include/linux/devcoredump.h |  33 ++++++++++
 2 files changed, 139 insertions(+), 49 deletions(-)

diff --git a/drivers/base/devcoredump.c b/drivers/base/devcoredump.c
index e42d0b5..4dd6dba 100644
--- a/drivers/base/devcoredump.c
+++ b/drivers/base/devcoredump.c
@@ -25,16 +25,20 @@ static bool devcd_disabled;
 /* if data isn't read by userspace after 5 minutes then delete it */
 #define DEVCD_TIMEOUT	(HZ * 60 * 5)
 
-struct devcd_entry {
-	struct device devcd_dev;
-	void *data;
-	size_t datalen;
-	struct module *owner;
+struct devcd_file {
+	struct bin_attribute bin_attr;
 	ssize_t (*read)(char *buffer, loff_t offset, size_t count,
 			void *data, size_t datalen);
 	void (*free)(void *data);
+};
+
+struct devcd_entry {
+	struct device devcd_dev;
+	struct module *owner;
 	struct delayed_work del_wk;
 	struct device *failing_dev;
+	int num_files;
+	struct devcd_file files[];
 };
 
 static struct devcd_entry *dev_to_devcd(struct device *dev)
@@ -45,8 +49,14 @@ static struct devcd_entry *dev_to_devcd(struct device *dev)
 static void devcd_dev_release(struct device *dev)
 {
 	struct devcd_entry *devcd = dev_to_devcd(dev);
+	int i;
+
+	for (i = 0; i < devcd->num_files; i++) {
+		struct devcd_file *file = &devcd->files[i];
+
+		file->free(file->bin_attr.private);
+	}
 
-	devcd->free(devcd->data);
 	module_put(devcd->owner);
 
 	/*
@@ -64,9 +74,14 @@ static void devcd_dev_release(struct device *dev)
 static void devcd_del(struct work_struct *wk)
 {
 	struct devcd_entry *devcd;
+	int i;
 
 	devcd = container_of(wk, struct devcd_entry, del_wk.work);
 
+	for (i = 0; i < devcd->num_files; i++)
+		device_remove_bin_file(&devcd->devcd_dev,
+				       &devcd->files[i].bin_attr);
+
 	device_del(&devcd->devcd_dev);
 	put_device(&devcd->devcd_dev);
 }
@@ -75,10 +90,11 @@ static ssize_t devcd_data_read(struct file *filp, struct kobject *kobj,
 			       struct bin_attribute *bin_attr,
 			       char *buffer, loff_t offset, size_t count)
 {
-	struct device *dev = kobj_to_dev(kobj);
-	struct devcd_entry *devcd = dev_to_devcd(dev);
+	struct devcd_file *file =
+		container_of(bin_attr, struct devcd_file, bin_attr);
 
-	return devcd->read(buffer, offset, count, devcd->data, devcd->datalen);
+	return file->read(buffer, offset, count, bin_attr->private,
+			  bin_attr->size);
 }
 
 static ssize_t devcd_data_write(struct file *filp, struct kobject *kobj,
@@ -93,25 +109,6 @@ static ssize_t devcd_data_write(struct file *filp, struct kobject *kobj,
 	return count;
 }
 
-static struct bin_attribute devcd_attr_data = {
-	.attr = { .name = "data", .mode = S_IRUSR | S_IWUSR, },
-	.size = 0,
-	.read = devcd_data_read,
-	.write = devcd_data_write,
-};
-
-static struct bin_attribute *devcd_dev_bin_attrs[] = {
-	&devcd_attr_data, NULL,
-};
-
-static const struct attribute_group devcd_dev_group = {
-	.bin_attrs = devcd_dev_bin_attrs,
-};
-
-static const struct attribute_group *devcd_dev_groups[] = {
-	&devcd_dev_group, NULL,
-};
-
 static int devcd_free(struct device *dev, void *data)
 {
 	struct devcd_entry *devcd = dev_to_devcd(dev);
@@ -157,7 +154,6 @@ static struct class devcd_class = {
 	.name		= "devcoredump",
 	.owner		= THIS_MODULE,
 	.dev_release	= devcd_dev_release,
-	.dev_groups	= devcd_dev_groups,
 	.class_groups	= devcd_class_groups,
 };
 
@@ -234,30 +230,55 @@ static ssize_t devcd_read_from_sgtable(char *buffer, loff_t offset,
 				  offset);
 }
 
+static struct devcd_entry *devcd_alloc(struct dev_coredumpm_bulk_data *files,
+				       int num_files, gfp_t gfp)
+{
+	struct devcd_entry *devcd;
+	int i;
+
+	devcd = kzalloc(struct_size(devcd, files, num_files), gfp);
+	if (!devcd)
+		return NULL;
+
+	devcd->num_files = num_files;
+
+	for (i = 0; i < devcd->num_files; i++) {
+		struct devcd_file *file = &devcd->files[i];
+
+		sysfs_bin_attr_init(&file->bin_attr);
+		file->bin_attr.attr.name = files[i].name;
+
+		file->bin_attr.attr.mode = 0600;
+		file->bin_attr.size = files[i].datalen;
+		file->bin_attr.private = files[i].data;
+		file->bin_attr.read = devcd_data_read;
+		file->bin_attr.write = devcd_data_write;
+
+		file->read = files[i].read;
+		file->free = files[i].free;
+	}
+
+	return devcd;
+}
+
 /**
- * dev_coredumpm - create device coredump with read/free methods
+ * dev_coredumpm_bulk - create a number of device coredump files
  * @dev: the struct device for the crashed device
  * @owner: the module that contains the read/free functions, use %THIS_MODULE
- * @data: data cookie for the @read/@free functions
- * @datalen: length of the data
  * @gfp: allocation flags
- * @read: function to read from the given buffer
- * @free: function to free the given buffer
+ * @files: the configuration of device coredump files
+ * @num_files: the number of device coredump files to create
  *
- * Creates a new device coredump for the given device. If a previous one hasn't
- * been read yet, the new coredump is discarded. The data lifetime is determined
- * by the device coredump framework and when it is no longer needed the @free
- * function will be called to free the data.
+ * This function allows drivers to create several device coredump files in
+ * one crashed device.
  */
-void dev_coredumpm(struct device *dev, struct module *owner,
-		   void *data, size_t datalen, gfp_t gfp,
-		   ssize_t (*read)(char *buffer, loff_t offset, size_t count,
-				   void *data, size_t datalen),
-		   void (*free)(void *data))
+void dev_coredumpm_bulk(struct device *dev, struct module *owner, gfp_t gfp,
+			struct dev_coredumpm_bulk_data *files, int num_files)
 {
 	static atomic_t devcd_count = ATOMIC_INIT(0);
 	struct devcd_entry *devcd;
 	struct device *existing;
+	int i;
 
 	if (devcd_disabled)
 		goto free;
@@ -272,15 +293,11 @@ void dev_coredumpm(struct device *dev, struct module *owner,
 	if (!try_module_get(owner))
 		goto free;
 
-	devcd = kzalloc(sizeof(*devcd), gfp);
+	devcd = devcd_alloc(files, num_files, gfp);
 	if (!devcd)
 		goto put_module;
 
 	devcd->owner = owner;
-	devcd->data = data;
-	devcd->datalen = datalen;
-	devcd->read = read;
-	devcd->free = free;
 	devcd->failing_dev = get_device(dev);
 
 	device_initialize(&devcd->devcd_dev);
@@ -292,6 +309,12 @@ void dev_coredumpm(struct device *dev, struct module *owner,
 	if (device_add(&devcd->devcd_dev))
 		goto put_device;
 
+	for (i = 0; i < devcd->num_files; i++) {
+		if (device_create_bin_file(&devcd->devcd_dev,
+					   &devcd->files[i].bin_attr))
+			/* nothing - some files will be missing */;
+	}
+
 	if (sysfs_create_link(&devcd->devcd_dev.kobj, &dev->kobj,
 			      "failing_device"))
 		/* nothing - symlink will be missing */;
@@ -309,7 +332,41 @@ void dev_coredumpm(struct device *dev, struct module *owner,
  put_module:
 	module_put(owner);
  free:
-	free(data);
+	for (i = 0; i < num_files; i++)
+		files[i].free(files[i].data);
+}
+EXPORT_SYMBOL_GPL(dev_coredumpm_bulk);
+
+/**
+ * dev_coredumpm - create device coredump with read/free methods
+ * @dev: the struct device for the crashed device
+ * @owner: the module that contains the read/free functions, use %THIS_MODULE
+ * @data: data cookie for the @read/@free functions
+ * @datalen: length of the data
+ * @gfp: allocation flags
+ * @read: function to read from the given buffer
+ * @free: function to free the given buffer
+ *
+ * Creates a new device coredump for the given device. If a previous one hasn't
+ * been read yet, the new coredump is discarded. The data lifetime is determined
+ * by the device coredump framework and when it is no longer needed the @free
+ * function will be called to free the data.
+ */
+void dev_coredumpm(struct device *dev, struct module *owner,
+		   void *data, size_t datalen, gfp_t gfp,
+		   ssize_t (*read)(char *buffer, loff_t offset, size_t count,
+				   void *data, size_t datalen),
+		   void (*free)(void *data))
+{
+	struct dev_coredumpm_bulk_data bulk_data = {
+		.name = "data",
+		.data = data,
+		.datalen = datalen,
+		.read = read,
+		.free = free,
+	};
+
+	dev_coredumpm_bulk(dev, owner, gfp, &bulk_data, 1);
 }
 EXPORT_SYMBOL_GPL(dev_coredumpm);
 
diff --git a/include/linux/devcoredump.h b/include/linux/devcoredump.h
index 269521f..9addb6f 100644
--- a/include/linux/devcoredump.h
+++ b/include/linux/devcoredump.h
@@ -65,6 +65,26 @@ static inline void _devcd_free_sgtable(struct scatterlist *table)
 	kfree(delete_iter);
 }
 
+/**
+ * struct dev_coredumpm_bulk_data - Data used for dev_coredumpm_bulk
+ *
+ * @name: coredump file name
+ * @data: data cookie for the @read/@free functions
+ * @datalen: length of the data
+ * @read: function to read from the given buffer
+ * @free: function to free the given buffer
+ *
+ * An array of this structure is passed as argument to dev_coredumpm_bulk, and
+ * used to describe each device coredump.
+ */
+struct dev_coredumpm_bulk_data {
+	char *name;
+	void *data;
+	size_t datalen;
+	ssize_t (*read)(char *buffer, loff_t offset, size_t count,
+			void *data, size_t datalen);
+	void (*free)(void *data);
+};
 
 #ifdef CONFIG_DEV_COREDUMP
 void dev_coredumpv(struct device *dev, void *data, size_t datalen,
@@ -76,6 +96,9 @@ void dev_coredumpm(struct device *dev, struct module *owner,
 				   void *data, size_t datalen),
 		   void (*free)(void *data));
 
+void dev_coredumpm_bulk(struct device *dev, struct module *owner, gfp_t gfp,
+			struct dev_coredumpm_bulk_data *files, int num_files);
+
 void dev_coredumpsg(struct device *dev, struct scatterlist *table,
 		    size_t datalen, gfp_t gfp);
 #else
@@ -95,6 +118,16 @@ dev_coredumpm(struct device *dev, struct module *owner,
 	free(data);
 }
 
+static inline
+void dev_coredumpm_bulk(struct device *dev, struct module *owner, gfp_t gfp,
+			struct dev_coredumpm_bulk_data *files, int num_files)
+{
+	int i;
+
+	for (i = 0; i < num_files; i++)
+		files[i].free(files[i].data);
+}
+
 static inline void dev_coredumpsg(struct device *dev, struct scatterlist *table,
 				  size_t datalen, gfp_t gfp)
 {
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 4/7] nvme.h: add telemetry log page definisions
  2019-05-07 16:58 [PATCH v2 0/7] nvme-pci: support device coredump Akinobu Mita
                   ` (2 preceding siblings ...)
  2019-05-07 16:58 ` [PATCH v2 3/7] devcoredump: allow to create several coredump files in one device Akinobu Mita
@ 2019-05-07 16:58 ` Akinobu Mita
  2019-05-07 17:28   ` Heitke, Kenneth
  2019-05-07 17:52   ` Heitke, Kenneth
  2019-05-07 16:58 ` [PATCH v2 5/7] nvme: add facility to check log page attributes Akinobu Mita
                   ` (3 subsequent siblings)
  7 siblings, 2 replies; 22+ messages in thread
From: Akinobu Mita @ 2019-05-07 16:58 UTC (permalink / raw)
  To: linux-nvme, linux-kernel
  Cc: Akinobu Mita, Johannes Berg, Keith Busch, Jens Axboe,
	Christoph Hellwig, Sagi Grimberg, Minwoo Im

Copy telemetry log page definisions from nvme-cli.

Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Minwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
---
* v2
- New patch in this version.

 include/linux/nvme.h | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/include/linux/nvme.h b/include/linux/nvme.h
index c40720c..5217fe4 100644
--- a/include/linux/nvme.h
+++ b/include/linux/nvme.h
@@ -396,6 +396,28 @@ enum {
 	NVME_NIDT_UUID		= 0x03,
 };
 
+/* Derived from 1.3a Figure 101: Get Log Page – Telemetry Host
+ * -Initiated Log (Log Identifier 07h)
+ */
+struct nvme_telemetry_log_page_hdr {
+	__u8    lpi; /* Log page identifier */
+	__u8    rsvd[4];
+	__u8    iee_oui[3];
+	__le16  dalb1; /* Data area 1 last block */
+	__le16  dalb2; /* Data area 2 last block */
+	__le16  dalb3; /* Data area 3 last block */
+	__u8    rsvd1[368]; /* TODO verify */
+	__u8    ctrlavail; /* Controller initiated data avail?*/
+	__u8    ctrldgn; /* Controller initiated telemetry Data Gen # */
+	__u8    rsnident[128];
+	/* We'll have to double fetch so we can get the header,
+	 * parse dalb1->3 determine how much size we need for the
+	 * log then alloc below. Or just do a secondary non-struct
+	 * allocation.
+	 */
+	__u8    telemetry_dataarea[0];
+};
+
 struct nvme_smart_log {
 	__u8			critical_warning;
 	__u8			temperature[2];
@@ -832,6 +854,7 @@ enum {
 	NVME_LOG_FW_SLOT	= 0x03,
 	NVME_LOG_CHANGED_NS	= 0x04,
 	NVME_LOG_CMD_EFFECTS	= 0x05,
+	NVME_LOG_TELEMETRY_CTRL	= 0x08,
 	NVME_LOG_ANA		= 0x0c,
 	NVME_LOG_DISC		= 0x70,
 	NVME_LOG_RESERVATION	= 0x80,
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 5/7] nvme: add facility to check log page attributes
  2019-05-07 16:58 [PATCH v2 0/7] nvme-pci: support device coredump Akinobu Mita
                   ` (3 preceding siblings ...)
  2019-05-07 16:58 ` [PATCH v2 4/7] nvme.h: add telemetry log page definisions Akinobu Mita
@ 2019-05-07 16:58 ` Akinobu Mita
  2019-05-07 16:58 ` [PATCH v2 6/7] nvme-pci: add device coredump support Akinobu Mita
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 22+ messages in thread
From: Akinobu Mita @ 2019-05-07 16:58 UTC (permalink / raw)
  To: linux-nvme, linux-kernel
  Cc: Akinobu Mita, Johannes Berg, Keith Busch, Jens Axboe,
	Christoph Hellwig, Sagi Grimberg, Minwoo Im

This provides a facility to check whether the controller supports the
telemetry log pages and log page offset field for the Get Log Page
command.

Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Minwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
---
* v2
- New patch in this version.

 drivers/nvme/host/core.c | 1 +
 drivers/nvme/host/nvme.h | 1 +
 include/linux/nvme.h     | 2 ++
 3 files changed, 4 insertions(+)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 6265d92..42f09d6 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2580,6 +2580,7 @@ int nvme_init_identify(struct nvme_ctrl *ctrl)
 	} else
 		ctrl->shutdown_timeout = shutdown_timeout;
 
+	ctrl->lpa = id->lpa;
 	ctrl->npss = id->npss;
 	ctrl->apsta = id->apsta;
 	prev_apst_enabled = ctrl->apst_enabled;
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 527d645..8711c71 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -195,6 +195,7 @@ struct nvme_ctrl {
 	u32 vs;
 	u32 sgls;
 	u16 kas;
+	u8 lpa;
 	u8 npss;
 	u8 apsta;
 	u32 oaes;
diff --git a/include/linux/nvme.h b/include/linux/nvme.h
index 5217fe4..c1c4ca5 100644
--- a/include/linux/nvme.h
+++ b/include/linux/nvme.h
@@ -294,6 +294,8 @@ enum {
 	NVME_CTRL_OACS_DIRECTIVES		= 1 << 5,
 	NVME_CTRL_OACS_DBBUF_SUPP		= 1 << 8,
 	NVME_CTRL_LPA_CMD_EFFECTS_LOG		= 1 << 1,
+	NVME_CTRL_LPA_EXTENDED_DATA		= 1 << 2,
+	NVME_CTRL_LPA_TELEMETRY_LOG		= 1 << 3,
 };
 
 struct nvme_lbaf {
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 6/7] nvme-pci: add device coredump support
  2019-05-07 16:58 [PATCH v2 0/7] nvme-pci: support device coredump Akinobu Mita
                   ` (4 preceding siblings ...)
  2019-05-07 16:58 ` [PATCH v2 5/7] nvme: add facility to check log page attributes Akinobu Mita
@ 2019-05-07 16:58 ` Akinobu Mita
  2019-05-07 17:07   ` Keith Busch
                     ` (3 more replies)
  2019-05-07 16:58 ` [PATCH v2 7/7] nvme-pci: trigger device coredump on command timeout Akinobu Mita
  2019-05-07 17:43 ` [PATCH v2 0/7] nvme-pci: support device coredump Heitke, Kenneth
  7 siblings, 4 replies; 22+ messages in thread
From: Akinobu Mita @ 2019-05-07 16:58 UTC (permalink / raw)
  To: linux-nvme, linux-kernel
  Cc: Akinobu Mita, Johannes Berg, Keith Busch, Jens Axboe,
	Christoph Hellwig, Sagi Grimberg, Minwoo Im

This enables to capture snapshot of controller information via device
coredump machanism.

The nvme device coredump creates the following coredump files.

- regs: NVMe controller registers (00h to 4Fh)
- sq<qid>: Submission queue
- cq<qid>: Completion queue
- telemetry-ctrl-log: Telemetry controller-initiated log (if available)
- data: Empty

The reason for an empty 'data' file is to provide a uniform way to notify
the device coredump is no longer needed by writing the 'data' file.

Since all existing drivers using the device coredump provide a 'data' file
if the nvme device coredump doesn't provide it, the userspace programs need
to know which driver provides what coredump file.

Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Minwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
---
* v2
- Exclude the doorbell registers from register dump.
- Save controller registers in a binary format instead of a text format.
- Create an empty 'data' file in the device coredump.
- Save telemetry controller-initiated log if available

 drivers/nvme/host/Kconfig |   1 +
 drivers/nvme/host/core.c  |   1 +
 drivers/nvme/host/pci.c   | 425 ++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 427 insertions(+)

diff --git a/drivers/nvme/host/Kconfig b/drivers/nvme/host/Kconfig
index 0f345e2..c3a06af 100644
--- a/drivers/nvme/host/Kconfig
+++ b/drivers/nvme/host/Kconfig
@@ -5,6 +5,7 @@ config BLK_DEV_NVME
 	tristate "NVM Express block device"
 	depends on PCI && BLOCK
 	select NVME_CORE
+	select WANT_DEV_COREDUMP
 	---help---
 	  The NVM Express driver is for solid state drives directly
 	  connected to the PCI or PCI Express bus.  If you know you
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 42f09d6..8d297c7 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2457,6 +2457,7 @@ int nvme_get_log(struct nvme_ctrl *ctrl, u32 nsid, u8 log_page, u8 lsp,
 
 	return nvme_submit_sync_cmd(ctrl->admin_q, &c, log, size);
 }
+EXPORT_SYMBOL_GPL(nvme_get_log);
 
 static int nvme_get_effects_log(struct nvme_ctrl *ctrl)
 {
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index a90cf5d..4684a86 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -9,6 +9,7 @@
 #include <linux/blkdev.h>
 #include <linux/blk-mq.h>
 #include <linux/blk-mq-pci.h>
+#include <linux/devcoredump.h>
 #include <linux/dmi.h>
 #include <linux/init.h>
 #include <linux/interrupt.h>
@@ -131,6 +132,9 @@ struct nvme_dev {
 	dma_addr_t host_mem_descs_dma;
 	struct nvme_host_mem_buf_desc *host_mem_descs;
 	void **host_mem_desc_bufs;
+
+	struct dev_coredumpm_bulk_data *dumps;
+	int num_dumps;
 };
 
 static int io_queue_depth_set(const char *val, const struct kernel_param *kp)
@@ -2867,6 +2871,426 @@ static int nvme_resume(struct device *dev)
 
 static SIMPLE_DEV_PM_OPS(nvme_dev_pm_ops, nvme_suspend, nvme_resume);
 
+#ifdef CONFIG_DEV_COREDUMP
+
+static ssize_t nvme_coredump_read(char *buffer, loff_t offset, size_t count,
+				  void *data, size_t datalen)
+{
+	return memory_read_from_buffer(buffer, count, &offset, data, datalen);
+}
+
+static void nvme_coredump_free(void *data)
+{
+	kvfree(data);
+}
+
+static int nvme_coredump_empty(struct dev_coredumpm_bulk_data *data)
+{
+	data->name = kstrdup("data", GFP_KERNEL);
+	if (!data->name)
+		return -ENOMEM;
+
+	data->data = NULL;
+	data->datalen = 0;
+	data->read = nvme_coredump_read;
+	data->free = nvme_coredump_free;
+
+	return 0;
+}
+
+static int nvme_coredump_regs(struct dev_coredumpm_bulk_data *data,
+			      struct nvme_ctrl *ctrl)
+{
+	const int reg_size = 0x50; /* 00h to 4Fh */
+
+	data->name = kstrdup("regs", GFP_KERNEL);
+	if (!data->name)
+		return -ENOMEM;
+
+	data->data = kvzalloc(reg_size, GFP_KERNEL);
+	if (!data->data) {
+		kfree(data->name);
+		return -ENOMEM;
+	}
+	memcpy_fromio(data->data, to_nvme_dev(ctrl)->bar, reg_size);
+
+	data->datalen = reg_size;
+	data->read = nvme_coredump_read;
+	data->free = nvme_coredump_free;
+
+	return 0;
+}
+
+static void *kvmemdup(const void *src, size_t len, gfp_t gfp)
+{
+	void *p;
+
+	p = kvmalloc(len, gfp);
+	if (p)
+		memcpy(p, src, len);
+
+	return p;
+}
+
+static int nvme_coredump_queues(struct dev_coredumpm_bulk_data *bulk_data,
+				struct nvme_ctrl *ctrl)
+{
+	int i;
+
+	for (i = 0; i < ctrl->queue_count; i++) {
+		struct dev_coredumpm_bulk_data *data = &bulk_data[2 * i];
+		struct nvme_queue *nvmeq = &to_nvme_dev(ctrl)->queues[i];
+
+		data[0].name = kasprintf(GFP_KERNEL, "sq%d", i);
+		data[0].data = kvmemdup(nvmeq->sq_cmds,
+					SQ_SIZE(nvmeq->q_depth), GFP_KERNEL);
+		data[0].datalen = SQ_SIZE(nvmeq->q_depth);
+		data[0].read = nvme_coredump_read;
+		data[0].free = nvme_coredump_free;
+
+		data[1].name = kasprintf(GFP_KERNEL, "cq%d", i);
+		data[1].data = kvmemdup((void *)nvmeq->cqes,
+					CQ_SIZE(nvmeq->q_depth), GFP_KERNEL);
+		data[1].datalen = CQ_SIZE(nvmeq->q_depth);
+		data[1].read = nvme_coredump_read;
+		data[1].free = nvme_coredump_free;
+
+		if (!data[0].name || !data[1].name ||
+		    !data[0].data || !data[1].data)
+			goto free;
+	}
+
+	return 0;
+free:
+	for (; i >= 0; i--) {
+		struct dev_coredumpm_bulk_data *data = &bulk_data[2 * i];
+
+		kfree(data[0].name);
+		kfree(data[1].name);
+		kvfree(data[0].data);
+		kvfree(data[1].data);
+	}
+
+	return -ENOMEM;
+}
+
+static struct
+dev_coredumpm_bulk_data *nvme_coredump_alloc(struct nvme_dev *dev, int n)
+{
+	struct dev_coredumpm_bulk_data *data;
+
+	data = krealloc(dev->dumps, sizeof(*data) * (dev->num_dumps + n),
+			GFP_KERNEL | __GFP_ZERO);
+	if (!data)
+		return NULL;
+
+	dev->dumps = data;
+	data += dev->num_dumps;
+	dev->num_dumps += n;
+
+	return data;
+}
+
+static void __nvme_coredump_discard(struct nvme_dev *dev, bool free_data)
+{
+	int i;
+
+	for (i = 0; i < dev->num_dumps; i++) {
+		kfree(dev->dumps[i].name);
+		if (free_data)
+			dev->dumps[i].free(dev->dumps[i].data);
+	}
+
+	kfree(dev->dumps);
+	dev->dumps = NULL;
+	dev->num_dumps = 0;
+}
+
+static void nvme_coredump_discard(struct nvme_dev *dev)
+{
+	__nvme_coredump_discard(dev, true);
+}
+
+static void nvme_coredump_clear(struct nvme_dev *dev)
+{
+	__nvme_coredump_discard(dev, false);
+}
+
+static int nvme_coredump_prologue(struct nvme_dev *dev)
+{
+	struct nvme_ctrl *ctrl = &dev->ctrl;
+	struct dev_coredumpm_bulk_data *bulk_data;
+	int ret;
+
+	if (WARN_ON(dev->dumps))
+		nvme_coredump_discard(dev);
+
+	bulk_data = nvme_coredump_alloc(dev, 2 + 2 * ctrl->queue_count);
+	if (!bulk_data)
+		return -ENOMEM;
+
+	ret = nvme_coredump_empty(&bulk_data[0]);
+	if (ret)
+		goto free_bulk_data;
+
+	ret = nvme_coredump_regs(&bulk_data[1], ctrl);
+	if (ret)
+		goto free_bulk_data;
+
+	ret = nvme_coredump_queues(&bulk_data[2], ctrl);
+	if (ret)
+		goto free_bulk_data;
+
+	return 0;
+
+free_bulk_data:
+	nvme_coredump_discard(dev);
+
+	return -ENOMEM;
+}
+
+static ssize_t nvme_coredump_read_sgtable(char *buffer, loff_t offset,
+					  size_t buf_len, void *data,
+					  size_t data_len)
+{
+	struct sg_table *table = data;
+
+	if (data_len <= offset)
+		return 0;
+
+	if (offset + buf_len > data_len)
+		buf_len = data_len - offset;
+
+	return sg_pcopy_to_buffer(table->sgl, sg_nents(table->sgl), buffer,
+				  buf_len, offset);
+}
+
+static void nvme_coredump_free_sgtable(void *data)
+{
+	struct sg_table *table = data;
+	struct scatterlist *sg;
+	int n = sg_nents(table->sgl);
+	int i;
+
+	for_each_sg(table->sgl, sg, n, i) {
+		struct page *page = sg_page(sg);
+
+		if (page)
+			__free_page(page);
+
+	}
+
+	kfree(table);
+}
+
+static struct sg_table *nvme_coredump_alloc_sgtable(size_t bytes)
+{
+	struct sg_table *table;
+	struct scatterlist *sg;
+	int n = DIV_ROUND_UP(bytes, PAGE_SIZE);
+	int i;
+
+	table = kzalloc(sizeof(*table), GFP_KERNEL);
+	if (!table)
+		return NULL;
+
+	if (sg_alloc_table(table, n, GFP_KERNEL))
+		goto free_table;
+
+	for_each_sg(table->sgl, sg, n, i) {
+		struct page *page = alloc_page(GFP_KERNEL);
+		size_t size = min_t(int, bytes, PAGE_SIZE);
+
+		if (!page)
+			goto free_page;
+
+		sg_set_page(sg, page, size, 0);
+		bytes -= size;
+	}
+
+	return table;
+free_page:
+	for_each_sg(table->sgl, sg, n, i) {
+		struct page *page = sg_page(sg);
+
+		if (page)
+			__free_page(page);
+
+	}
+free_table:
+	kfree(table);
+
+	return NULL;
+}
+
+static int nvme_get_telemetry_log_blocks(struct nvme_ctrl *ctrl, void *buf,
+					 size_t bytes, loff_t offset)
+{
+	const size_t chunk_size = ctrl->max_hw_sectors * ctrl->page_size;
+	loff_t pos = 0;
+
+	while (pos < bytes) {
+		size_t size = min_t(size_t, bytes - pos, chunk_size);
+		int ret;
+
+		ret = nvme_get_log(ctrl, NVME_NSID_ALL, NVME_LOG_TELEMETRY_CTRL,
+				   0, buf + pos, size, offset + pos);
+		if (ret)
+			return ret;
+
+		pos += size;
+	}
+
+	return 0;
+}
+
+static int nvme_get_telemetry_log(struct nvme_ctrl *ctrl,
+				  struct sg_table *table, size_t bytes)
+{
+	int n = sg_nents(table->sgl);
+	struct scatterlist *sg;
+	size_t offset = 0;
+	int i;
+
+	for_each_sg(table->sgl, sg, n, i) {
+		struct page *page = sg_page(sg);
+		size_t size = min_t(int, bytes - offset, sg->length);
+		int ret;
+
+		ret = nvme_get_telemetry_log_blocks(ctrl, page_address(page),
+						    size, offset);
+		if (ret)
+			return ret;
+
+		offset += size;
+	}
+
+	return 0;
+}
+
+static int nvme_coredump_telemetry_log(struct dev_coredumpm_bulk_data *data,
+				       struct nvme_ctrl *ctrl)
+{
+	struct nvme_telemetry_log_page_hdr *page_hdr;
+	struct sg_table *table;
+	int log_size;
+	int ret;
+	u8 ctrldgn;
+
+	if (!(ctrl->lpa & NVME_CTRL_LPA_TELEMETRY_LOG))
+		return -EINVAL;
+	if (!(ctrl->lpa & NVME_CTRL_LPA_EXTENDED_DATA))
+		return -EINVAL;
+
+	page_hdr = kzalloc(sizeof(*page_hdr), GFP_KERNEL);
+	if (!page_hdr)
+		return -ENOMEM;
+
+	ret = nvme_get_log(ctrl, NVME_NSID_ALL, NVME_LOG_TELEMETRY_CTRL, 0,
+			   page_hdr, sizeof(*page_hdr), 0);
+	if (ret)
+		goto free_page_hdr;
+
+	if (!page_hdr->ctrlavail) {
+		ret = -EINVAL;
+		goto free_page_hdr;
+	}
+
+	log_size = (le16_to_cpu(page_hdr->dalb3) + 1) * 512;
+
+	table = nvme_coredump_alloc_sgtable(log_size);
+	if (!table) {
+		ret = -ENOMEM;
+		goto free_page_hdr;
+	}
+
+	ret = nvme_get_telemetry_log(ctrl, table, log_size);
+	if (ret)
+		goto free_table;
+
+	sg_pcopy_to_buffer(table->sgl, sg_nents(table->sgl), &ctrldgn,
+			   sizeof(ctrldgn),
+			   offsetof(typeof(*page_hdr), ctrldgn));
+
+	ret = nvme_get_log(ctrl, NVME_NSID_ALL, NVME_LOG_TELEMETRY_CTRL, 0,
+			   page_hdr, sizeof(*page_hdr), 0);
+	if (ret)
+		goto free_table;
+
+	if (page_hdr->ctrldgn != ctrldgn) {
+		ret = -EINVAL;
+		goto free_table;
+	}
+
+	data->name = kstrdup("telemetry-ctrl-log", GFP_KERNEL);
+	if (!data->name) {
+		ret = -ENOMEM;
+		goto free_table;
+	}
+
+	data->data = table;
+	data->datalen = log_size;
+	data->read = nvme_coredump_read_sgtable;
+	data->free = nvme_coredump_free_sgtable;
+
+	kfree(page_hdr);
+
+	return 0;
+free_table:
+	nvme_coredump_free_sgtable(table);
+free_page_hdr:
+	kfree(page_hdr);
+
+	return ret;
+}
+
+static void nvme_coredump_epilogue(struct nvme_dev *dev)
+{
+	struct dev_coredumpm_bulk_data *bulk_data;
+
+	if (!dev->dumps)
+		return;
+
+	bulk_data = nvme_coredump_alloc(dev, 1);
+	if (bulk_data) {
+		if (nvme_coredump_telemetry_log(bulk_data, &dev->ctrl))
+			dev->num_dumps--;
+	}
+
+	dev_coredumpm_bulk(dev->dev, THIS_MODULE, GFP_KERNEL,
+			   dev->dumps, dev->num_dumps);
+	nvme_coredump_clear(dev);
+}
+
+static void nvme_coredump(struct device *dev)
+{
+	struct nvme_dev *ndev = dev_get_drvdata(dev);
+
+	mutex_lock(&ndev->shutdown_lock);
+
+	nvme_coredump_prologue(ndev);
+	nvme_coredump_epilogue(ndev);
+
+	mutex_unlock(&ndev->shutdown_lock);
+}
+
+#else
+
+static int nvme_coredump_prologue(struct nvme_dev *dev)
+{
+	return 0;
+}
+
+static void nvme_coredump_epilogue(struct nvme_dev *dev)
+{
+}
+
+static void nvme_coredump(struct device *dev)
+{
+}
+
+#endif /* CONFIG_DEV_COREDUMP */
+
 static pci_ers_result_t nvme_error_detected(struct pci_dev *pdev,
 						pci_channel_state_t state)
 {
@@ -2972,6 +3396,7 @@ static struct pci_driver nvme_driver = {
 	.shutdown	= nvme_shutdown,
 	.driver		= {
 		.pm	= &nvme_dev_pm_ops,
+		.coredump = nvme_coredump,
 	},
 	.sriov_configure = pci_sriov_configure_simple,
 	.err_handler	= &nvme_err_handler,
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 7/7] nvme-pci: trigger device coredump on command timeout
  2019-05-07 16:58 [PATCH v2 0/7] nvme-pci: support device coredump Akinobu Mita
                   ` (5 preceding siblings ...)
  2019-05-07 16:58 ` [PATCH v2 6/7] nvme-pci: add device coredump support Akinobu Mita
@ 2019-05-07 16:58 ` Akinobu Mita
  2019-05-07 17:43 ` [PATCH v2 0/7] nvme-pci: support device coredump Heitke, Kenneth
  7 siblings, 0 replies; 22+ messages in thread
From: Akinobu Mita @ 2019-05-07 16:58 UTC (permalink / raw)
  To: linux-nvme, linux-kernel
  Cc: Akinobu Mita, Johannes Berg, Keith Busch, Jens Axboe,
	Christoph Hellwig, Sagi Grimberg, Minwoo Im

This enables the nvme driver to trigger a device coredump when command
timeout occurs, and it helps diagnose and debug issues.

This can be tested with fail_io_timeout fault injection.

	# echo 1 > /sys/kernel/debug/fail_io_timeout/probability
	# echo 1 > /sys/kernel/debug/fail_io_timeout/times
	# echo 1 > /sys/block/nvme0n1/io-timeout-fail
	# dd if=/dev/nvme0n1 of=/dev/null

Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Minwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
---
- Make coredump procedure into two phases (before resetting controller and
  after resetting as soon as admin queue is available).

 drivers/nvme/host/pci.c | 35 ++++++++++++++++++++++-------------
 1 file changed, 22 insertions(+), 13 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 4684a86..4ff918f 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -87,9 +87,12 @@ MODULE_PARM_DESC(poll_queues, "Number of queues to use for polled IO.");
 struct nvme_dev;
 struct nvme_queue;
 
-static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown);
+static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown, bool dump);
 static bool __nvme_disable_io_queues(struct nvme_dev *dev, u8 opcode);
 
+static int nvme_coredump_prologue(struct nvme_dev *dev);
+static void nvme_coredump_epilogue(struct nvme_dev *dev);
+
 /*
  * Represents an NVM Express device.  Each nvme_dev is a PCI function.
  */
@@ -1289,7 +1292,7 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved)
 	 */
 	if (nvme_should_reset(dev, csts)) {
 		nvme_warn_reset(dev, csts);
-		nvme_dev_disable(dev, false);
+		nvme_dev_disable(dev, false, true);
 		nvme_reset_ctrl(&dev->ctrl);
 		return BLK_EH_DONE;
 	}
@@ -1316,7 +1319,7 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved)
 		dev_warn_ratelimited(dev->ctrl.device,
 			 "I/O %d QID %d timeout, disable controller\n",
 			 req->tag, nvmeq->qid);
-		nvme_dev_disable(dev, false);
+		nvme_dev_disable(dev, false, true);
 		nvme_req(req)->flags |= NVME_REQ_CANCELLED;
 		return BLK_EH_DONE;
 	default:
@@ -1332,7 +1335,7 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved)
 		dev_warn(dev->ctrl.device,
 			 "I/O %d QID %d timeout, reset controller\n",
 			 req->tag, nvmeq->qid);
-		nvme_dev_disable(dev, false);
+		nvme_dev_disable(dev, false, true);
 		nvme_reset_ctrl(&dev->ctrl);
 
 		nvme_req(req)->flags |= NVME_REQ_CANCELLED;
@@ -2399,7 +2402,7 @@ static void nvme_pci_disable(struct nvme_dev *dev)
 	}
 }
 
-static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
+static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown, bool dump)
 {
 	bool dead = true;
 	struct pci_dev *pdev = to_pci_dev(dev->dev);
@@ -2424,6 +2427,9 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
 			nvme_wait_freeze_timeout(&dev->ctrl, NVME_IO_TIMEOUT);
 	}
 
+	if (dump)
+		nvme_coredump_prologue(dev);
+
 	nvme_stop_queues(&dev->ctrl);
 
 	if (!dead && dev->ctrl.queue_count > 0) {
@@ -2491,7 +2497,7 @@ static void nvme_remove_dead_ctrl(struct nvme_dev *dev, int status)
 	dev_warn(dev->ctrl.device, "Removing after probe failure status: %d\n", status);
 
 	nvme_get_ctrl(&dev->ctrl);
-	nvme_dev_disable(dev, false);
+	nvme_dev_disable(dev, false, false);
 	nvme_kill_queues(&dev->ctrl);
 	if (!queue_work(nvme_wq, &dev->remove_work))
 		nvme_put_ctrl(&dev->ctrl);
@@ -2513,7 +2519,7 @@ static void nvme_reset_work(struct work_struct *work)
 	 * moving on.
 	 */
 	if (dev->ctrl.ctrl_config & NVME_CC_ENABLE)
-		nvme_dev_disable(dev, false);
+		nvme_dev_disable(dev, false, false);
 
 	mutex_lock(&dev->shutdown_lock);
 	result = nvme_pci_enable(dev);
@@ -2550,6 +2556,8 @@ static void nvme_reset_work(struct work_struct *work)
 	if (result)
 		goto out;
 
+	nvme_coredump_epilogue(dev);
+
 	if (dev->ctrl.oacs & NVME_CTRL_OACS_SEC_SUPP) {
 		if (!dev->ctrl.opal_dev)
 			dev->ctrl.opal_dev =
@@ -2612,6 +2620,7 @@ static void nvme_reset_work(struct work_struct *work)
  out_unlock:
 	mutex_unlock(&dev->shutdown_lock);
  out:
+	nvme_coredump_epilogue(dev);
 	nvme_remove_dead_ctrl(dev, result);
 }
 
@@ -2802,7 +2811,7 @@ static int nvme_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 static void nvme_reset_prepare(struct pci_dev *pdev)
 {
 	struct nvme_dev *dev = pci_get_drvdata(pdev);
-	nvme_dev_disable(dev, false);
+	nvme_dev_disable(dev, false, false);
 }
 
 static void nvme_reset_done(struct pci_dev *pdev)
@@ -2814,7 +2823,7 @@ static void nvme_reset_done(struct pci_dev *pdev)
 static void nvme_shutdown(struct pci_dev *pdev)
 {
 	struct nvme_dev *dev = pci_get_drvdata(pdev);
-	nvme_dev_disable(dev, true);
+	nvme_dev_disable(dev, true, false);
 }
 
 /*
@@ -2831,14 +2840,14 @@ static void nvme_remove(struct pci_dev *pdev)
 
 	if (!pci_device_is_present(pdev)) {
 		nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_DEAD);
-		nvme_dev_disable(dev, true);
+		nvme_dev_disable(dev, true, false);
 		nvme_dev_remove_admin(dev);
 	}
 
 	flush_work(&dev->ctrl.reset_work);
 	nvme_stop_ctrl(&dev->ctrl);
 	nvme_remove_namespaces(&dev->ctrl);
-	nvme_dev_disable(dev, true);
+	nvme_dev_disable(dev, true, false);
 	nvme_release_cmb(dev);
 	nvme_free_host_mem(dev);
 	nvme_dev_remove_admin(dev);
@@ -2855,7 +2864,7 @@ static int nvme_suspend(struct device *dev)
 	struct pci_dev *pdev = to_pci_dev(dev);
 	struct nvme_dev *ndev = pci_get_drvdata(pdev);
 
-	nvme_dev_disable(ndev, true);
+	nvme_dev_disable(ndev, true, false);
 	return 0;
 }
 
@@ -3307,7 +3316,7 @@ static pci_ers_result_t nvme_error_detected(struct pci_dev *pdev,
 	case pci_channel_io_frozen:
 		dev_warn(dev->ctrl.device,
 			"frozen state error detected, reset controller\n");
-		nvme_dev_disable(dev, false);
+		nvme_dev_disable(dev, false, false);
 		return PCI_ERS_RESULT_NEED_RESET;
 	case pci_channel_io_perm_failure:
 		dev_warn(dev->ctrl.device,
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 6/7] nvme-pci: add device coredump support
  2019-05-07 16:58 ` [PATCH v2 6/7] nvme-pci: add device coredump support Akinobu Mita
@ 2019-05-07 17:07   ` Keith Busch
  2019-05-07 17:44   ` Heitke, Kenneth
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 22+ messages in thread
From: Keith Busch @ 2019-05-07 17:07 UTC (permalink / raw)
  To: Akinobu Mita
  Cc: linux-nvme, linux-kernel, Johannes Berg, Keith Busch, Jens Axboe,
	Christoph Hellwig, Sagi Grimberg, Minwoo Im

On Wed, May 08, 2019 at 01:58:33AM +0900, Akinobu Mita wrote:
> +static void nvme_coredump(struct device *dev)
> +{
> +	struct nvme_dev *ndev = dev_get_drvdata(dev);
> +
> +	mutex_lock(&ndev->shutdown_lock);
> +
> +	nvme_coredump_prologue(ndev);
> +	nvme_coredump_epilogue(ndev);
> +
> +	mutex_unlock(&ndev->shutdown_lock);
> +}

This is a bit of a mine field. The shutdown_lock is held when reclaiming
requests that didn't see a response. If you're holding it here and your
telemetry log page times out, we're going to deadlock. And since the
controller is probably in a buggered state when you try to retrieve one,
I would guess an unrecoverable timeout is the most likely outcome.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 4/7] nvme.h: add telemetry log page definisions
  2019-05-07 16:58 ` [PATCH v2 4/7] nvme.h: add telemetry log page definisions Akinobu Mita
@ 2019-05-07 17:28   ` Heitke, Kenneth
  2019-05-08 15:42     ` Akinobu Mita
  2019-05-07 17:52   ` Heitke, Kenneth
  1 sibling, 1 reply; 22+ messages in thread
From: Heitke, Kenneth @ 2019-05-07 17:28 UTC (permalink / raw)
  To: Akinobu Mita, linux-nvme, linux-kernel
  Cc: Jens Axboe, Sagi Grimberg, Keith Busch, Minwoo Im, Johannes Berg,
	Christoph Hellwig



On 5/7/2019 10:58 AM, Akinobu Mita wrote:
> Copy telemetry log page definisions from nvme-cli.
> 
> Cc: Johannes Berg <johannes@sipsolutions.net>
> Cc: Keith Busch <keith.busch@intel.com>
> Cc: Jens Axboe <axboe@fb.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Sagi Grimberg <sagi@grimberg.me>
> Cc: Minwoo Im <minwoo.im.dev@gmail.com>
> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
> ---
> * v2
> - New patch in this version.
> 
>   include/linux/nvme.h | 23 +++++++++++++++++++++++
>   1 file changed, 23 insertions(+)
> 
> diff --git a/include/linux/nvme.h b/include/linux/nvme.h
> index c40720c..5217fe4 100644
> --- a/include/linux/nvme.h
> +++ b/include/linux/nvme.h
> @@ -396,6 +396,28 @@ enum {
>   	NVME_NIDT_UUID		= 0x03,
>   };
>   
> +/* Derived from 1.3a Figure 101: Get Log Page – Telemetry Host
> + * -Initiated Log (Log Identifier 07h)
> + */
> +struct nvme_telemetry_log_page_hdr {
> +	__u8    lpi; /* Log page identifier */
> +	__u8    rsvd[4];
> +	__u8    iee_oui[3];
> +	__le16  dalb1; /* Data area 1 last block */
> +	__le16  dalb2; /* Data area 2 last block */
> +	__le16  dalb3; /* Data area 3 last block */
> +	__u8    rsvd1[368]; /* TODO verify */

Remove the TODO

> +	__u8    ctrlavail; /* Controller initiated data avail?*/
> +	__u8    ctrldgn; /* Controller initiated telemetry Data Gen # */
> +	__u8    rsnident[128];
> +	/* We'll have to double fetch so we can get the header,
> +	 * parse dalb1->3 determine how much size we need for the
> +	 * log then alloc below. Or just do a secondary non-struct
> +	 * allocation.
> +	 */

This comment isn't necessary. You usually can't read the entire
telemetry log at once and the header is a fixed size. You would likely
just read the header followed by reads of the different data areas.

> +	__u8    telemetry_dataarea[0];
> +};
> +
>   struct nvme_smart_log {
>   	__u8			critical_warning;
>   	__u8			temperature[2];
> @@ -832,6 +854,7 @@ enum {
>   	NVME_LOG_FW_SLOT	= 0x03,
>   	NVME_LOG_CHANGED_NS	= 0x04,
>   	NVME_LOG_CMD_EFFECTS	= 0x05,
> +	NVME_LOG_TELEMETRY_CTRL	= 0x08,
>   	NVME_LOG_ANA		= 0x0c,
>   	NVME_LOG_DISC		= 0x70,
>   	NVME_LOG_RESERVATION	= 0x80,
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 3/7] devcoredump: allow to create several coredump files in one device
  2019-05-07 16:58 ` [PATCH v2 3/7] devcoredump: allow to create several coredump files in one device Akinobu Mita
@ 2019-05-07 17:35   ` Heitke, Kenneth
  2019-05-08 15:40     ` Akinobu Mita
  0 siblings, 1 reply; 22+ messages in thread
From: Heitke, Kenneth @ 2019-05-07 17:35 UTC (permalink / raw)
  To: Akinobu Mita, linux-nvme, linux-kernel
  Cc: Jens Axboe, Sagi Grimberg, Keith Busch, Minwoo Im, Johannes Berg,
	Christoph Hellwig



On 5/7/2019 10:58 AM, Akinobu Mita wrote:
> @@ -292,6 +309,12 @@ void dev_coredumpm(struct device *dev, struct module *owner,
>   	if (device_add(&devcd->devcd_dev))
>   		goto put_device;
>   
> +	for (i = 0; i < devcd->num_files; i++) {
> +		if (device_create_bin_file(&devcd->devcd_dev,
> +					   &devcd->files[i].bin_attr))
> +			/* nothing - some files will be missing */;

Is the conditional necessary if you aren't going to do anything?

> +	}
> +
>   	if (sysfs_create_link(&devcd->devcd_dev.kobj, &dev->kobj,
>   			      "failing_device"))
>   		/* nothing - symlink will be missing */;

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 0/7] nvme-pci: support device coredump
  2019-05-07 16:58 [PATCH v2 0/7] nvme-pci: support device coredump Akinobu Mita
                   ` (6 preceding siblings ...)
  2019-05-07 16:58 ` [PATCH v2 7/7] nvme-pci: trigger device coredump on command timeout Akinobu Mita
@ 2019-05-07 17:43 ` Heitke, Kenneth
  7 siblings, 0 replies; 22+ messages in thread
From: Heitke, Kenneth @ 2019-05-07 17:43 UTC (permalink / raw)
  To: Akinobu Mita, linux-nvme, linux-kernel
  Cc: Jens Axboe, Sagi Grimberg, Keith Busch, Minwoo Im, Johannes Berg,
	Christoph Hellwig



On 5/7/2019 10:58 AM, Akinobu Mita wrote:
> This enables to capture snapshot of controller information via device
> coredump machanism, and it helps diagnose and debug issues.

s/machanism/mechanism/
> 
> The nvme device coredump is triggered when command timeout occurs, and
> creates the following coredump files.
> 
>   - regs: NVMe controller registers (00h to 4Fh)
>   - sq<qid>: Submission queue
>   - cq<qid>: Completion queue
>   - telemetry-ctrl-log: Telemetry controller-initiated log (if available)
>   - data: Empty

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 6/7] nvme-pci: add device coredump support
  2019-05-07 16:58 ` [PATCH v2 6/7] nvme-pci: add device coredump support Akinobu Mita
  2019-05-07 17:07   ` Keith Busch
@ 2019-05-07 17:44   ` Heitke, Kenneth
  2019-05-07 20:31   ` Heitke, Kenneth
       [not found]   ` <CGME20190507171318epcas5p129bb73b39447d62a7d266ed461687488@epcms2p3>
  3 siblings, 0 replies; 22+ messages in thread
From: Heitke, Kenneth @ 2019-05-07 17:44 UTC (permalink / raw)
  To: Akinobu Mita, linux-nvme, linux-kernel
  Cc: Jens Axboe, Sagi Grimberg, Keith Busch, Minwoo Im, Johannes Berg,
	Christoph Hellwig



On 5/7/2019 10:58 AM, Akinobu Mita wrote:
> This enables to capture snapshot of controller information via device
> coredump machanism.

s/machanism/mechanism/

> 
> The nvme device coredump creates the following coredump files.
> 
> - regs: NVMe controller registers (00h to 4Fh)
> - sq<qid>: Submission queue
> - cq<qid>: Completion queue
> - telemetry-ctrl-log: Telemetry controller-initiated log (if available)
> - data: Empty
> 
> The reason for an empty 'data' file is to provide a uniform way to notify
> the device coredump is no longer needed by writing the 'data' file.
> 
> Since all existing drivers using the device coredump provide a 'data' file
> if the nvme device coredump doesn't provide it, the userspace programs need
> to know which driver provides what coredump file.
> 
> Cc: Johannes Berg <johannes@sipsolutions.net>
> Cc: Keith Busch <keith.busch@intel.com>
> Cc: Jens Axboe <axboe@fb.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Sagi Grimberg <sagi@grimberg.me>
> Cc: Minwoo Im <minwoo.im.dev@gmail.com>
> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 4/7] nvme.h: add telemetry log page definisions
  2019-05-07 16:58 ` [PATCH v2 4/7] nvme.h: add telemetry log page definisions Akinobu Mita
  2019-05-07 17:28   ` Heitke, Kenneth
@ 2019-05-07 17:52   ` Heitke, Kenneth
  2019-05-08 15:54     ` Akinobu Mita
  1 sibling, 1 reply; 22+ messages in thread
From: Heitke, Kenneth @ 2019-05-07 17:52 UTC (permalink / raw)
  To: Akinobu Mita, linux-nvme, linux-kernel
  Cc: Jens Axboe, Sagi Grimberg, Keith Busch, Minwoo Im, Johannes Berg,
	Christoph Hellwig



On 5/7/2019 10:58 AM, Akinobu Mita wrote:
> Copy telemetry log page definisions from nvme-cli.
> 
> Cc: Johannes Berg <johannes@sipsolutions.net>
> Cc: Keith Busch <keith.busch@intel.com>
> Cc: Jens Axboe <axboe@fb.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Sagi Grimberg <sagi@grimberg.me>
> Cc: Minwoo Im <minwoo.im.dev@gmail.com>
> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
> ---
> * v2
> - New patch in this version.
> 
>   include/linux/nvme.h | 23 +++++++++++++++++++++++
>   1 file changed, 23 insertions(+)
> 
> diff --git a/include/linux/nvme.h b/include/linux/nvme.h
> index c40720c..5217fe4 100644
> --- a/include/linux/nvme.h
> +++ b/include/linux/nvme.h
> @@ -396,6 +396,28 @@ enum {
>   	NVME_NIDT_UUID		= 0x03,
>   };
>   
> +/* Derived from 1.3a Figure 101: Get Log Page – Telemetry Host
> + * -Initiated Log (Log Identifier 07h)
> + */

Is this Host Initiated or Controller Initiated? The comment says host
initiated but everything else seems to indicated controller initiated.
Is controller initiated even the correct choice because the controller
would have sent an AER to indicate that the host should pull the
telemetry data.


> +struct nvme_telemetry_log_page_hdr {
> +	__u8    lpi; /* Log page identifier */
> +	__u8    rsvd[4];
> +	__u8    iee_oui[3];
> +	__le16  dalb1; /* Data area 1 last block */
> +	__le16  dalb2; /* Data area 2 last block */
> +	__le16  dalb3; /* Data area 3 last block */
> +	__u8    rsvd1[368]; /* TODO verify */
> +	__u8    ctrlavail; /* Controller initiated data avail?*/
> +	__u8    ctrldgn; /* Controller initiated telemetry Data Gen # */
> +	__u8    rsnident[128];
> +	/* We'll have to double fetch so we can get the header,
> +	 * parse dalb1->3 determine how much size we need for the
> +	 * log then alloc below. Or just do a secondary non-struct
> +	 * allocation.
> +	 */
> +	__u8    telemetry_dataarea[0];
> +};
> +
>   struct nvme_smart_log {
>   	__u8			critical_warning;
>   	__u8			temperature[2];
> @@ -832,6 +854,7 @@ enum {
>   	NVME_LOG_FW_SLOT	= 0x03,
>   	NVME_LOG_CHANGED_NS	= 0x04,
>   	NVME_LOG_CMD_EFFECTS	= 0x05,
> +	NVME_LOG_TELEMETRY_CTRL	= 0x08,
>   	NVME_LOG_ANA		= 0x0c,
>   	NVME_LOG_DISC		= 0x70,
>   	NVME_LOG_RESERVATION	= 0x80,
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 6/7] nvme-pci: add device coredump support
  2019-05-07 16:58 ` [PATCH v2 6/7] nvme-pci: add device coredump support Akinobu Mita
  2019-05-07 17:07   ` Keith Busch
  2019-05-07 17:44   ` Heitke, Kenneth
@ 2019-05-07 20:31   ` Heitke, Kenneth
  2019-05-07 21:22     ` Keith Busch
       [not found]   ` <CGME20190507171318epcas5p129bb73b39447d62a7d266ed461687488@epcms2p3>
  3 siblings, 1 reply; 22+ messages in thread
From: Heitke, Kenneth @ 2019-05-07 20:31 UTC (permalink / raw)
  To: Akinobu Mita, linux-nvme, linux-kernel
  Cc: Jens Axboe, Sagi Grimberg, Keith Busch, Minwoo Im, Johannes Berg,
	Christoph Hellwig



On 5/7/2019 10:58 AM, Akinobu Mita wrote:
> +
> +static int nvme_get_telemetry_log_blocks(struct nvme_ctrl *ctrl, void *buf,
> +					 size_t bytes, loff_t offset)
> +{
> +	const size_t chunk_size = ctrl->max_hw_sectors * ctrl->page_size;

Just curious if chunk_size is correct since page size and block size can
be different.


> +	loff_t pos = 0;
> +
> +	while (pos < bytes) {
> +		size_t size = min_t(size_t, bytes - pos, chunk_size);
> +		int ret;
> +
> +		ret = nvme_get_log(ctrl, NVME_NSID_ALL, NVME_LOG_TELEMETRY_CTRL,
> +				   0, buf + pos, size, offset + pos);
> +		if (ret)
> +			return ret;
> +
> +		pos += size;
> +	}
> +
> +	return 0;
> +}

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 6/7] nvme-pci: add device coredump support
  2019-05-07 20:31   ` Heitke, Kenneth
@ 2019-05-07 21:22     ` Keith Busch
  2019-05-08 15:56       ` Akinobu Mita
  0 siblings, 1 reply; 22+ messages in thread
From: Keith Busch @ 2019-05-07 21:22 UTC (permalink / raw)
  To: Heitke, Kenneth
  Cc: Akinobu Mita, linux-nvme, linux-kernel, Jens Axboe,
	Sagi Grimberg, Keith Busch, Minwoo Im, Johannes Berg,
	Christoph Hellwig

On Tue, May 07, 2019 at 02:31:41PM -0600, Heitke, Kenneth wrote:
> On 5/7/2019 10:58 AM, Akinobu Mita wrote:
> > +
> > +static int nvme_get_telemetry_log_blocks(struct nvme_ctrl *ctrl, void *buf,
> > +					 size_t bytes, loff_t offset)
> > +{
> > +	const size_t chunk_size = ctrl->max_hw_sectors * ctrl->page_size;
> 
> Just curious if chunk_size is correct since page size and block size can
> be different.

They're always different. ctrl->page_size is hard-coded to 4k, while
sectors are always 512b.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 6/7] nvme-pci: add device coredump support
       [not found]   ` <CGME20190507171318epcas5p129bb73b39447d62a7d266ed461687488@epcms2p3>
@ 2019-05-08  0:25     ` Minwoo Im
  2019-05-08 16:02       ` Akinobu Mita
  0 siblings, 1 reply; 22+ messages in thread
From: Minwoo Im @ 2019-05-08  0:25 UTC (permalink / raw)
  To: Keith Busch, Akinobu Mita, Minwoo Im
  Cc: Jens Axboe, Sagi Grimberg, linux-kernel, linux-nvme, Keith Busch,
	Minwoo Im, Johannes Berg, Christoph Hellwig

> This is a bit of a mine field. The shutdown_lock is held when reclaiming
> requests that didn't see a response. If you're holding it here and your
> telemetry log page times out, we're going to deadlock. And since the
> controller is probably in a buggered state when you try to retrieve one,
> I would guess an unrecoverable timeout is the most likely outcome.

Akinobu,

I actually agree with Keith's one.  In my experience, there was always internal
error inside device when timeout occurs in nvme driver which means the
following command might not be completed due to lack of response from
device.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 3/7] devcoredump: allow to create several coredump files in one device
  2019-05-07 17:35   ` Heitke, Kenneth
@ 2019-05-08 15:40     ` Akinobu Mita
  0 siblings, 0 replies; 22+ messages in thread
From: Akinobu Mita @ 2019-05-08 15:40 UTC (permalink / raw)
  To: Heitke, Kenneth
  Cc: linux-nvme, LKML, Jens Axboe, Sagi Grimberg, Keith Busch,
	Minwoo Im, Johannes Berg, Christoph Hellwig

2019年5月8日(水) 2:35 Heitke, Kenneth <kenneth.heitke@intel.com>:
>
>
>
> On 5/7/2019 10:58 AM, Akinobu Mita wrote:
> > @@ -292,6 +309,12 @@ void dev_coredumpm(struct device *dev, struct module *owner,
> >       if (device_add(&devcd->devcd_dev))
> >               goto put_device;
> >
> > +     for (i = 0; i < devcd->num_files; i++) {
> > +             if (device_create_bin_file(&devcd->devcd_dev,
> > +                                        &devcd->files[i].bin_attr))
> > +                     /* nothing - some files will be missing */;
>
> Is the conditional necessary if you aren't going to do anything?

The device_create_bin_file() is declared with __must_check, so ignoring
the return value emits warning.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 4/7] nvme.h: add telemetry log page definisions
  2019-05-07 17:28   ` Heitke, Kenneth
@ 2019-05-08 15:42     ` Akinobu Mita
  0 siblings, 0 replies; 22+ messages in thread
From: Akinobu Mita @ 2019-05-08 15:42 UTC (permalink / raw)
  To: Heitke, Kenneth
  Cc: linux-nvme, LKML, Jens Axboe, Sagi Grimberg, Keith Busch,
	Minwoo Im, Johannes Berg, Christoph Hellwig

2019年5月8日(水) 2:28 Heitke, Kenneth <kenneth.heitke@intel.com>:
>
>
>
> On 5/7/2019 10:58 AM, Akinobu Mita wrote:
> > Copy telemetry log page definisions from nvme-cli.
> >
> > Cc: Johannes Berg <johannes@sipsolutions.net>
> > Cc: Keith Busch <keith.busch@intel.com>
> > Cc: Jens Axboe <axboe@fb.com>
> > Cc: Christoph Hellwig <hch@lst.de>
> > Cc: Sagi Grimberg <sagi@grimberg.me>
> > Cc: Minwoo Im <minwoo.im.dev@gmail.com>
> > Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
> > ---
> > * v2
> > - New patch in this version.
> >
> >   include/linux/nvme.h | 23 +++++++++++++++++++++++
> >   1 file changed, 23 insertions(+)
> >
> > diff --git a/include/linux/nvme.h b/include/linux/nvme.h
> > index c40720c..5217fe4 100644
> > --- a/include/linux/nvme.h
> > +++ b/include/linux/nvme.h
> > @@ -396,6 +396,28 @@ enum {
> >       NVME_NIDT_UUID          = 0x03,
> >   };
> >
> > +/* Derived from 1.3a Figure 101: Get Log Page – Telemetry Host
> > + * -Initiated Log (Log Identifier 07h)
> > + */
> > +struct nvme_telemetry_log_page_hdr {
> > +     __u8    lpi; /* Log page identifier */
> > +     __u8    rsvd[4];
> > +     __u8    iee_oui[3];
> > +     __le16  dalb1; /* Data area 1 last block */
> > +     __le16  dalb2; /* Data area 2 last block */
> > +     __le16  dalb3; /* Data area 3 last block */
> > +     __u8    rsvd1[368]; /* TODO verify */
>
> Remove the TODO

OK.

> > +     __u8    ctrlavail; /* Controller initiated data avail?*/
> > +     __u8    ctrldgn; /* Controller initiated telemetry Data Gen # */
> > +     __u8    rsnident[128];
> > +     /* We'll have to double fetch so we can get the header,
> > +      * parse dalb1->3 determine how much size we need for the
> > +      * log then alloc below. Or just do a secondary non-struct
> > +      * allocation.
> > +      */
>
> This comment isn't necessary. You usually can't read the entire
> telemetry log at once and the header is a fixed size. You would likely
> just read the header followed by reads of the different data areas.

This comment is derived from nvme-cli.  So firstly, I'll send a patch
for nvme-cli.  If the changes are accepted, I'll update this comment, too.

> > +     __u8    telemetry_dataarea[0];
> > +};
> > +
> >   struct nvme_smart_log {
> >       __u8                    critical_warning;
> >       __u8                    temperature[2];
> > @@ -832,6 +854,7 @@ enum {
> >       NVME_LOG_FW_SLOT        = 0x03,
> >       NVME_LOG_CHANGED_NS     = 0x04,
> >       NVME_LOG_CMD_EFFECTS    = 0x05,
> > +     NVME_LOG_TELEMETRY_CTRL = 0x08,
> >       NVME_LOG_ANA            = 0x0c,
> >       NVME_LOG_DISC           = 0x70,
> >       NVME_LOG_RESERVATION    = 0x80,
> >

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 4/7] nvme.h: add telemetry log page definisions
  2019-05-07 17:52   ` Heitke, Kenneth
@ 2019-05-08 15:54     ` Akinobu Mita
  0 siblings, 0 replies; 22+ messages in thread
From: Akinobu Mita @ 2019-05-08 15:54 UTC (permalink / raw)
  To: Heitke, Kenneth
  Cc: linux-nvme, LKML, Jens Axboe, Sagi Grimberg, Keith Busch,
	Minwoo Im, Johannes Berg, Christoph Hellwig

2019年5月8日(水) 2:53 Heitke, Kenneth <kenneth.heitke@intel.com>:
>
>
>
> On 5/7/2019 10:58 AM, Akinobu Mita wrote:
> > Copy telemetry log page definisions from nvme-cli.
> >
> > Cc: Johannes Berg <johannes@sipsolutions.net>
> > Cc: Keith Busch <keith.busch@intel.com>
> > Cc: Jens Axboe <axboe@fb.com>
> > Cc: Christoph Hellwig <hch@lst.de>
> > Cc: Sagi Grimberg <sagi@grimberg.me>
> > Cc: Minwoo Im <minwoo.im.dev@gmail.com>
> > Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
> > ---
> > * v2
> > - New patch in this version.
> >
> >   include/linux/nvme.h | 23 +++++++++++++++++++++++
> >   1 file changed, 23 insertions(+)
> >
> > diff --git a/include/linux/nvme.h b/include/linux/nvme.h
> > index c40720c..5217fe4 100644
> > --- a/include/linux/nvme.h
> > +++ b/include/linux/nvme.h
> > @@ -396,6 +396,28 @@ enum {
> >       NVME_NIDT_UUID          = 0x03,
> >   };
> >
> > +/* Derived from 1.3a Figure 101: Get Log Page – Telemetry Host
> > + * -Initiated Log (Log Identifier 07h)
> > + */
>
> Is this Host Initiated or Controller Initiated? The comment says host
> initiated but everything else seems to indicated controller initiated.

Both telemetry host initiated and controller initiated log headers have
the same structure.  If this comment is confusing, it is also considered
to be removed.

> Is controller initiated even the correct choice because the controller
> would have sent an AER to indicate that the host should pull the
> telemetry data.

It seems useful to retrieve telemetry log continually with the aid of
user space tool reacting an Asynchronous Event.

Similarly, it could be useful to retrieve telemetry log as soon as the
device is successfully recovered from the crash.  (Although I still do
not find the device that has Telemetry Controller-Initiated Data Available
field is set to 1h.)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 6/7] nvme-pci: add device coredump support
  2019-05-07 21:22     ` Keith Busch
@ 2019-05-08 15:56       ` Akinobu Mita
  0 siblings, 0 replies; 22+ messages in thread
From: Akinobu Mita @ 2019-05-08 15:56 UTC (permalink / raw)
  To: Keith Busch
  Cc: Heitke, Kenneth, linux-nvme, LKML, Jens Axboe, Sagi Grimberg,
	Keith Busch, Minwoo Im, Johannes Berg, Christoph Hellwig

2019年5月8日(水) 6:28 Keith Busch <kbusch@kernel.org>:
>
> On Tue, May 07, 2019 at 02:31:41PM -0600, Heitke, Kenneth wrote:
> > On 5/7/2019 10:58 AM, Akinobu Mita wrote:
> > > +
> > > +static int nvme_get_telemetry_log_blocks(struct nvme_ctrl *ctrl, void *buf,
> > > +                                    size_t bytes, loff_t offset)
> > > +{
> > > +   const size_t chunk_size = ctrl->max_hw_sectors * ctrl->page_size;
> >
> > Just curious if chunk_size is correct since page size and block size can
> > be different.
>
> They're always different. ctrl->page_size is hard-coded to 4k, while
> sectors are always 512b.

Oops.  I misunderstood how ctrl->max_hw_sectors is initialized from MDTS.
Also overflow check was required here for the architectures that use
"unsigned int" size_t.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 6/7] nvme-pci: add device coredump support
  2019-05-08  0:25     ` Minwoo Im
@ 2019-05-08 16:02       ` Akinobu Mita
  0 siblings, 0 replies; 22+ messages in thread
From: Akinobu Mita @ 2019-05-08 16:02 UTC (permalink / raw)
  To: minwoo.im
  Cc: Keith Busch, Jens Axboe, Sagi Grimberg, linux-kernel, linux-nvme,
	Keith Busch, Minwoo Im, Johannes Berg, Christoph Hellwig

2019年5月8日(水) 9:25 Minwoo Im <minwoo.im@samsung.com>:
>
> > This is a bit of a mine field. The shutdown_lock is held when reclaiming
> > requests that didn't see a response. If you're holding it here and your
> > telemetry log page times out, we're going to deadlock. And since the
> > controller is probably in a buggered state when you try to retrieve one,
> > I would guess an unrecoverable timeout is the most likely outcome.
>
> Akinobu,
>
> I actually agree with Keith's one.  In my experience, there was always internal
> error inside device when timeout occurs in nvme driver which means the
> following command might not be completed due to lack of response from
> device.

The nvme_coredump() is .coredump() callback of device_driver which is
called when anything is written to the /sys/devices/.../coredump.
Providing this callback is optional, but simply removing this manual
device coredump method is a bit inconvenient.

So instead of directly retrieving the snapshot with the shutdown_lock held
in this callback, I'll change this to just scheduling the reset work, and
the actual device coredump will be triggered by the same procedure that is
implemented in the patch 7/7.  Therefore telemetry log is retrieved only
when the controller is successfully recovered from the crash.

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2019-05-08 16:02 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-07 16:58 [PATCH v2 0/7] nvme-pci: support device coredump Akinobu Mita
2019-05-07 16:58 ` [PATCH v2 1/7] devcoredump: use memory_read_from_buffer Akinobu Mita
2019-05-07 16:58 ` [PATCH v2 2/7] devcoredump: fix typo in comment Akinobu Mita
2019-05-07 16:58 ` [PATCH v2 3/7] devcoredump: allow to create several coredump files in one device Akinobu Mita
2019-05-07 17:35   ` Heitke, Kenneth
2019-05-08 15:40     ` Akinobu Mita
2019-05-07 16:58 ` [PATCH v2 4/7] nvme.h: add telemetry log page definisions Akinobu Mita
2019-05-07 17:28   ` Heitke, Kenneth
2019-05-08 15:42     ` Akinobu Mita
2019-05-07 17:52   ` Heitke, Kenneth
2019-05-08 15:54     ` Akinobu Mita
2019-05-07 16:58 ` [PATCH v2 5/7] nvme: add facility to check log page attributes Akinobu Mita
2019-05-07 16:58 ` [PATCH v2 6/7] nvme-pci: add device coredump support Akinobu Mita
2019-05-07 17:07   ` Keith Busch
2019-05-07 17:44   ` Heitke, Kenneth
2019-05-07 20:31   ` Heitke, Kenneth
2019-05-07 21:22     ` Keith Busch
2019-05-08 15:56       ` Akinobu Mita
     [not found]   ` <CGME20190507171318epcas5p129bb73b39447d62a7d266ed461687488@epcms2p3>
2019-05-08  0:25     ` Minwoo Im
2019-05-08 16:02       ` Akinobu Mita
2019-05-07 16:58 ` [PATCH v2 7/7] nvme-pci: trigger device coredump on command timeout Akinobu Mita
2019-05-07 17:43 ` [PATCH v2 0/7] nvme-pci: support device coredump Heitke, Kenneth

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).