linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/7] nvme-pci: support device coredump
@ 2019-05-12 15:54 Akinobu Mita
  2019-05-12 15:54 ` [PATCH v3 1/7] devcoredump: use memory_read_from_buffer Akinobu Mita
                   ` (8 more replies)
  0 siblings, 9 replies; 22+ messages in thread
From: Akinobu Mita @ 2019-05-12 15:54 UTC (permalink / raw)
  To: linux-nvme, linux-kernel
  Cc: Akinobu Mita, Johannes Berg, Keith Busch, Jens Axboe,
	Christoph Hellwig, Sagi Grimberg, Minwoo Im, Kenneth Heitke

This enables to collect snapshot of controller information via device
coredump mechanism.  The nvme device coredump is triggered when command
timeout occurs, and can also be triggered by writing sysfs attribute.

After finishing the nvme device coredump, the following files are created.

 - regs: NVMe controller registers (00h to 4Fh)
 - sq<qid>: Submission queue
 - cq<qid>: Completion queue
 - telemetry-ctrl-log: Telemetry controller-initiated log (if available)
 - data: Empty

The device coredump mechanism currently allows drivers to create only a
single coredump file, so this also provides a new function that allows
drivers to create several device coredump files in one crashed device.

* v3
- Merge 'add telemetry log page definisions' patch and 'add facility to
  check log page attributes' patch
- Copy struct nvme_telemetry_log_page_hdr from the latest nvme-cli
- Add BUILD_BUG_ON for the size of struct nvme_telemetry_log_page_hdr
- Fix typo s/machanism/mechanism/ in commit log
- Fix max transfer size calculation for get log page
- Add function comments
- Extract 'enable to trigger device coredump by hand' patch
- Don't try to get telemetry log when admin queue is not available
- Avoid deadlock in .coredump callback

* v2
- Add Reviewed-by tag.
- Add patch to fix typo in comment
- Remove unneeded braces.
- Allocate device_entry followed by an array of devcd_file elements.
- Add telemetry log page definisions
- Add facility to check log page attributes
- Exclude the doorbell registers from register dump.
- Save controller registers in a binary format instead of a text format.
- Create an empty 'data' file in the device coredump.
- Save telemetry controller-initiated log if available
- Make coredump procedure into two phases (before resetting controller and
  after resetting as soon as admin queue is available).

Akinobu Mita (7):
  devcoredump: use memory_read_from_buffer
  devcoredump: fix typo in comment
  devcoredump: allow to create several coredump files in one device
  nvme: add basic facility to get telemetry log page
  nvme-pci: add device coredump infrastructure
  nvme-pci: trigger device coredump on command timeout
  nvme-pci: enable to trigger device coredump by hand

 drivers/base/devcoredump.c  | 168 +++++++++------
 drivers/nvme/host/Kconfig   |   1 +
 drivers/nvme/host/core.c    |   3 +
 drivers/nvme/host/nvme.h    |   1 +
 drivers/nvme/host/pci.c     | 494 ++++++++++++++++++++++++++++++++++++++++++--
 include/linux/devcoredump.h |  33 +++
 include/linux/nvme.h        |  17 ++
 7 files changed, 644 insertions(+), 73 deletions(-)

Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Minwoo Im <minwoo.im.dev@gmail.com>
Cc: Kenneth Heitke <kenneth.heitke@intel.com>
-- 
2.7.4


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v3 1/7] devcoredump: use memory_read_from_buffer
  2019-05-12 15:54 [PATCH v3 0/7] nvme-pci: support device coredump Akinobu Mita
@ 2019-05-12 15:54 ` Akinobu Mita
  2019-05-13 15:28   ` Chaitanya Kulkarni
  2019-05-12 15:54 ` [PATCH v3 2/7] devcoredump: fix typo in comment Akinobu Mita
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 22+ messages in thread
From: Akinobu Mita @ 2019-05-12 15:54 UTC (permalink / raw)
  To: linux-nvme, linux-kernel
  Cc: Akinobu Mita, Johannes Berg, Keith Busch, Jens Axboe,
	Christoph Hellwig, Sagi Grimberg, Minwoo Im, Kenneth Heitke

Use memory_read_from_buffer() to simplify devcd_readv().

Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Minwoo Im <minwoo.im.dev@gmail.com>
Cc: Kenneth Heitke <kenneth.heitke@intel.com>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
---
* v3
- No change since v2

 drivers/base/devcoredump.c | 11 +----------
 1 file changed, 1 insertion(+), 10 deletions(-)

diff --git a/drivers/base/devcoredump.c b/drivers/base/devcoredump.c
index f1a3353..3c960a6 100644
--- a/drivers/base/devcoredump.c
+++ b/drivers/base/devcoredump.c
@@ -164,16 +164,7 @@ static struct class devcd_class = {
 static ssize_t devcd_readv(char *buffer, loff_t offset, size_t count,
 			   void *data, size_t datalen)
 {
-	if (offset > datalen)
-		return -EINVAL;
-
-	if (offset + count > datalen)
-		count = datalen - offset;
-
-	if (count)
-		memcpy(buffer, ((u8 *)data) + offset, count);
-
-	return count;
+	return memory_read_from_buffer(buffer, count, &offset, data, datalen);
 }
 
 static void devcd_freev(void *data)
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 2/7] devcoredump: fix typo in comment
  2019-05-12 15:54 [PATCH v3 0/7] nvme-pci: support device coredump Akinobu Mita
  2019-05-12 15:54 ` [PATCH v3 1/7] devcoredump: use memory_read_from_buffer Akinobu Mita
@ 2019-05-12 15:54 ` Akinobu Mita
  2019-05-13 15:24   ` Chaitanya Kulkarni
  2019-05-12 15:54 ` [PATCH v3 3/7] devcoredump: allow to create several coredump files in one device Akinobu Mita
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 22+ messages in thread
From: Akinobu Mita @ 2019-05-12 15:54 UTC (permalink / raw)
  To: linux-nvme, linux-kernel
  Cc: Akinobu Mita, Johannes Berg, Keith Busch, Jens Axboe,
	Christoph Hellwig, Sagi Grimberg, Minwoo Im, Kenneth Heitke

s/dev_coredumpmsg/dev_coredumpsg/

Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Minwoo Im <minwoo.im.dev@gmail.com>
Cc: Kenneth Heitke <kenneth.heitke@intel.com>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
---
* v3
- No change since v2

 drivers/base/devcoredump.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/base/devcoredump.c b/drivers/base/devcoredump.c
index 3c960a6..e42d0b5 100644
--- a/drivers/base/devcoredump.c
+++ b/drivers/base/devcoredump.c
@@ -314,7 +314,7 @@ void dev_coredumpm(struct device *dev, struct module *owner,
 EXPORT_SYMBOL_GPL(dev_coredumpm);
 
 /**
- * dev_coredumpmsg - create device coredump that uses scatterlist as data
+ * dev_coredumpsg - create device coredump that uses scatterlist as data
  * parameter
  * @dev: the struct device for the crashed device
  * @table: the dump data
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 3/7] devcoredump: allow to create several coredump files in one device
  2019-05-12 15:54 [PATCH v3 0/7] nvme-pci: support device coredump Akinobu Mita
  2019-05-12 15:54 ` [PATCH v3 1/7] devcoredump: use memory_read_from_buffer Akinobu Mita
  2019-05-12 15:54 ` [PATCH v3 2/7] devcoredump: fix typo in comment Akinobu Mita
@ 2019-05-12 15:54 ` Akinobu Mita
  2019-05-12 15:54 ` [PATCH v3 4/7] nvme: add basic facility to get telemetry log page Akinobu Mita
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 22+ messages in thread
From: Akinobu Mita @ 2019-05-12 15:54 UTC (permalink / raw)
  To: linux-nvme, linux-kernel
  Cc: Akinobu Mita, Johannes Berg, Keith Busch, Jens Axboe,
	Christoph Hellwig, Sagi Grimberg, Minwoo Im, Kenneth Heitke

The device coredump mechanism currently allows drivers to create only a
single coredump file.  If there are several binary blobs to dump, we need
to define a binary format or conver to text format in order to put them
into a single coredump file.

This provides a new function that allows drivers to create several device
coredump files in one crashed device.

Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Minwoo Im <minwoo.im.dev@gmail.com>
Cc: Kenneth Heitke <kenneth.heitke@intel.com>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
---
* v3
- No change since v2

 drivers/base/devcoredump.c  | 155 ++++++++++++++++++++++++++++++--------------
 include/linux/devcoredump.h |  33 ++++++++++
 2 files changed, 139 insertions(+), 49 deletions(-)

diff --git a/drivers/base/devcoredump.c b/drivers/base/devcoredump.c
index e42d0b5..4dd6dba 100644
--- a/drivers/base/devcoredump.c
+++ b/drivers/base/devcoredump.c
@@ -25,16 +25,20 @@ static bool devcd_disabled;
 /* if data isn't read by userspace after 5 minutes then delete it */
 #define DEVCD_TIMEOUT	(HZ * 60 * 5)
 
-struct devcd_entry {
-	struct device devcd_dev;
-	void *data;
-	size_t datalen;
-	struct module *owner;
+struct devcd_file {
+	struct bin_attribute bin_attr;
 	ssize_t (*read)(char *buffer, loff_t offset, size_t count,
 			void *data, size_t datalen);
 	void (*free)(void *data);
+};
+
+struct devcd_entry {
+	struct device devcd_dev;
+	struct module *owner;
 	struct delayed_work del_wk;
 	struct device *failing_dev;
+	int num_files;
+	struct devcd_file files[];
 };
 
 static struct devcd_entry *dev_to_devcd(struct device *dev)
@@ -45,8 +49,14 @@ static struct devcd_entry *dev_to_devcd(struct device *dev)
 static void devcd_dev_release(struct device *dev)
 {
 	struct devcd_entry *devcd = dev_to_devcd(dev);
+	int i;
+
+	for (i = 0; i < devcd->num_files; i++) {
+		struct devcd_file *file = &devcd->files[i];
+
+		file->free(file->bin_attr.private);
+	}
 
-	devcd->free(devcd->data);
 	module_put(devcd->owner);
 
 	/*
@@ -64,9 +74,14 @@ static void devcd_dev_release(struct device *dev)
 static void devcd_del(struct work_struct *wk)
 {
 	struct devcd_entry *devcd;
+	int i;
 
 	devcd = container_of(wk, struct devcd_entry, del_wk.work);
 
+	for (i = 0; i < devcd->num_files; i++)
+		device_remove_bin_file(&devcd->devcd_dev,
+				       &devcd->files[i].bin_attr);
+
 	device_del(&devcd->devcd_dev);
 	put_device(&devcd->devcd_dev);
 }
@@ -75,10 +90,11 @@ static ssize_t devcd_data_read(struct file *filp, struct kobject *kobj,
 			       struct bin_attribute *bin_attr,
 			       char *buffer, loff_t offset, size_t count)
 {
-	struct device *dev = kobj_to_dev(kobj);
-	struct devcd_entry *devcd = dev_to_devcd(dev);
+	struct devcd_file *file =
+		container_of(bin_attr, struct devcd_file, bin_attr);
 
-	return devcd->read(buffer, offset, count, devcd->data, devcd->datalen);
+	return file->read(buffer, offset, count, bin_attr->private,
+			  bin_attr->size);
 }
 
 static ssize_t devcd_data_write(struct file *filp, struct kobject *kobj,
@@ -93,25 +109,6 @@ static ssize_t devcd_data_write(struct file *filp, struct kobject *kobj,
 	return count;
 }
 
-static struct bin_attribute devcd_attr_data = {
-	.attr = { .name = "data", .mode = S_IRUSR | S_IWUSR, },
-	.size = 0,
-	.read = devcd_data_read,
-	.write = devcd_data_write,
-};
-
-static struct bin_attribute *devcd_dev_bin_attrs[] = {
-	&devcd_attr_data, NULL,
-};
-
-static const struct attribute_group devcd_dev_group = {
-	.bin_attrs = devcd_dev_bin_attrs,
-};
-
-static const struct attribute_group *devcd_dev_groups[] = {
-	&devcd_dev_group, NULL,
-};
-
 static int devcd_free(struct device *dev, void *data)
 {
 	struct devcd_entry *devcd = dev_to_devcd(dev);
@@ -157,7 +154,6 @@ static struct class devcd_class = {
 	.name		= "devcoredump",
 	.owner		= THIS_MODULE,
 	.dev_release	= devcd_dev_release,
-	.dev_groups	= devcd_dev_groups,
 	.class_groups	= devcd_class_groups,
 };
 
@@ -234,30 +230,55 @@ static ssize_t devcd_read_from_sgtable(char *buffer, loff_t offset,
 				  offset);
 }
 
+static struct devcd_entry *devcd_alloc(struct dev_coredumpm_bulk_data *files,
+				       int num_files, gfp_t gfp)
+{
+	struct devcd_entry *devcd;
+	int i;
+
+	devcd = kzalloc(struct_size(devcd, files, num_files), gfp);
+	if (!devcd)
+		return NULL;
+
+	devcd->num_files = num_files;
+
+	for (i = 0; i < devcd->num_files; i++) {
+		struct devcd_file *file = &devcd->files[i];
+
+		sysfs_bin_attr_init(&file->bin_attr);
+		file->bin_attr.attr.name = files[i].name;
+
+		file->bin_attr.attr.mode = 0600;
+		file->bin_attr.size = files[i].datalen;
+		file->bin_attr.private = files[i].data;
+		file->bin_attr.read = devcd_data_read;
+		file->bin_attr.write = devcd_data_write;
+
+		file->read = files[i].read;
+		file->free = files[i].free;
+	}
+
+	return devcd;
+}
+
 /**
- * dev_coredumpm - create device coredump with read/free methods
+ * dev_coredumpm_bulk - create a number of device coredump files
  * @dev: the struct device for the crashed device
  * @owner: the module that contains the read/free functions, use %THIS_MODULE
- * @data: data cookie for the @read/@free functions
- * @datalen: length of the data
  * @gfp: allocation flags
- * @read: function to read from the given buffer
- * @free: function to free the given buffer
+ * @files: the configuration of device coredump files
+ * @num_files: the number of device coredump files to create
  *
- * Creates a new device coredump for the given device. If a previous one hasn't
- * been read yet, the new coredump is discarded. The data lifetime is determined
- * by the device coredump framework and when it is no longer needed the @free
- * function will be called to free the data.
+ * This function allows drivers to create several device coredump files in
+ * one crashed device.
  */
-void dev_coredumpm(struct device *dev, struct module *owner,
-		   void *data, size_t datalen, gfp_t gfp,
-		   ssize_t (*read)(char *buffer, loff_t offset, size_t count,
-				   void *data, size_t datalen),
-		   void (*free)(void *data))
+void dev_coredumpm_bulk(struct device *dev, struct module *owner, gfp_t gfp,
+			struct dev_coredumpm_bulk_data *files, int num_files)
 {
 	static atomic_t devcd_count = ATOMIC_INIT(0);
 	struct devcd_entry *devcd;
 	struct device *existing;
+	int i;
 
 	if (devcd_disabled)
 		goto free;
@@ -272,15 +293,11 @@ void dev_coredumpm(struct device *dev, struct module *owner,
 	if (!try_module_get(owner))
 		goto free;
 
-	devcd = kzalloc(sizeof(*devcd), gfp);
+	devcd = devcd_alloc(files, num_files, gfp);
 	if (!devcd)
 		goto put_module;
 
 	devcd->owner = owner;
-	devcd->data = data;
-	devcd->datalen = datalen;
-	devcd->read = read;
-	devcd->free = free;
 	devcd->failing_dev = get_device(dev);
 
 	device_initialize(&devcd->devcd_dev);
@@ -292,6 +309,12 @@ void dev_coredumpm(struct device *dev, struct module *owner,
 	if (device_add(&devcd->devcd_dev))
 		goto put_device;
 
+	for (i = 0; i < devcd->num_files; i++) {
+		if (device_create_bin_file(&devcd->devcd_dev,
+					   &devcd->files[i].bin_attr))
+			/* nothing - some files will be missing */;
+	}
+
 	if (sysfs_create_link(&devcd->devcd_dev.kobj, &dev->kobj,
 			      "failing_device"))
 		/* nothing - symlink will be missing */;
@@ -309,7 +332,41 @@ void dev_coredumpm(struct device *dev, struct module *owner,
  put_module:
 	module_put(owner);
  free:
-	free(data);
+	for (i = 0; i < num_files; i++)
+		files[i].free(files[i].data);
+}
+EXPORT_SYMBOL_GPL(dev_coredumpm_bulk);
+
+/**
+ * dev_coredumpm - create device coredump with read/free methods
+ * @dev: the struct device for the crashed device
+ * @owner: the module that contains the read/free functions, use %THIS_MODULE
+ * @data: data cookie for the @read/@free functions
+ * @datalen: length of the data
+ * @gfp: allocation flags
+ * @read: function to read from the given buffer
+ * @free: function to free the given buffer
+ *
+ * Creates a new device coredump for the given device. If a previous one hasn't
+ * been read yet, the new coredump is discarded. The data lifetime is determined
+ * by the device coredump framework and when it is no longer needed the @free
+ * function will be called to free the data.
+ */
+void dev_coredumpm(struct device *dev, struct module *owner,
+		   void *data, size_t datalen, gfp_t gfp,
+		   ssize_t (*read)(char *buffer, loff_t offset, size_t count,
+				   void *data, size_t datalen),
+		   void (*free)(void *data))
+{
+	struct dev_coredumpm_bulk_data bulk_data = {
+		.name = "data",
+		.data = data,
+		.datalen = datalen,
+		.read = read,
+		.free = free,
+	};
+
+	dev_coredumpm_bulk(dev, owner, gfp, &bulk_data, 1);
 }
 EXPORT_SYMBOL_GPL(dev_coredumpm);
 
diff --git a/include/linux/devcoredump.h b/include/linux/devcoredump.h
index 269521f..9addb6f 100644
--- a/include/linux/devcoredump.h
+++ b/include/linux/devcoredump.h
@@ -65,6 +65,26 @@ static inline void _devcd_free_sgtable(struct scatterlist *table)
 	kfree(delete_iter);
 }
 
+/**
+ * struct dev_coredumpm_bulk_data - Data used for dev_coredumpm_bulk
+ *
+ * @name: coredump file name
+ * @data: data cookie for the @read/@free functions
+ * @datalen: length of the data
+ * @read: function to read from the given buffer
+ * @free: function to free the given buffer
+ *
+ * An array of this structure is passed as argument to dev_coredumpm_bulk, and
+ * used to describe each device coredump.
+ */
+struct dev_coredumpm_bulk_data {
+	char *name;
+	void *data;
+	size_t datalen;
+	ssize_t (*read)(char *buffer, loff_t offset, size_t count,
+			void *data, size_t datalen);
+	void (*free)(void *data);
+};
 
 #ifdef CONFIG_DEV_COREDUMP
 void dev_coredumpv(struct device *dev, void *data, size_t datalen,
@@ -76,6 +96,9 @@ void dev_coredumpm(struct device *dev, struct module *owner,
 				   void *data, size_t datalen),
 		   void (*free)(void *data));
 
+void dev_coredumpm_bulk(struct device *dev, struct module *owner, gfp_t gfp,
+			struct dev_coredumpm_bulk_data *files, int num_files);
+
 void dev_coredumpsg(struct device *dev, struct scatterlist *table,
 		    size_t datalen, gfp_t gfp);
 #else
@@ -95,6 +118,16 @@ dev_coredumpm(struct device *dev, struct module *owner,
 	free(data);
 }
 
+static inline
+void dev_coredumpm_bulk(struct device *dev, struct module *owner, gfp_t gfp,
+			struct dev_coredumpm_bulk_data *files, int num_files)
+{
+	int i;
+
+	for (i = 0; i < num_files; i++)
+		files[i].free(files[i].data);
+}
+
 static inline void dev_coredumpsg(struct device *dev, struct scatterlist *table,
 				  size_t datalen, gfp_t gfp)
 {
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 4/7] nvme: add basic facility to get telemetry log page
  2019-05-12 15:54 [PATCH v3 0/7] nvme-pci: support device coredump Akinobu Mita
                   ` (2 preceding siblings ...)
  2019-05-12 15:54 ` [PATCH v3 3/7] devcoredump: allow to create several coredump files in one device Akinobu Mita
@ 2019-05-12 15:54 ` Akinobu Mita
  2019-05-13 15:34   ` Chaitanya Kulkarni
  2019-05-12 15:54 ` [PATCH v3 5/7] nvme-pci: add device coredump infrastructure Akinobu Mita
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 22+ messages in thread
From: Akinobu Mita @ 2019-05-12 15:54 UTC (permalink / raw)
  To: linux-nvme, linux-kernel
  Cc: Akinobu Mita, Johannes Berg, Keith Busch, Jens Axboe,
	Christoph Hellwig, Sagi Grimberg, Minwoo Im, Kenneth Heitke

This adds the required definisions to get telemetry log page.
The telemetry log page structure and identifier are copied from nvme-cli.

We also need a facility to check log page attributes in order to know
the controller supports the telemetry log pages and log page offset field
for the Get Log Page command.  The telemetry data area could be larger
than maximum data transfer size, so we may need to split into multiple
transfers with incremental page offset.

Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Minwoo Im <minwoo.im.dev@gmail.com>
Cc: Kenneth Heitke <kenneth.heitke@intel.com>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
---
* v3
- Merge 'add telemetry log page definisions' patch and 'add facility to
  check log page attributes' patch
- Copy struct nvme_telemetry_log_page_hdr from the latest nvme-cli
- Add BUILD_BUG_ON for the size of struct nvme_telemetry_log_page_hdr

 drivers/nvme/host/core.c |  2 ++
 drivers/nvme/host/nvme.h |  1 +
 include/linux/nvme.h     | 17 +++++++++++++++++
 3 files changed, 20 insertions(+)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index a6644a2..0cea2a8 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2585,6 +2585,7 @@ int nvme_init_identify(struct nvme_ctrl *ctrl)
 	} else
 		ctrl->shutdown_timeout = shutdown_timeout;
 
+	ctrl->lpa = id->lpa;
 	ctrl->npss = id->npss;
 	ctrl->apsta = id->apsta;
 	prev_apst_enabled = ctrl->apst_enabled;
@@ -3898,6 +3899,7 @@ static inline void _nvme_check_size(void)
 	BUILD_BUG_ON(sizeof(struct nvme_id_ctrl) != NVME_IDENTIFY_DATA_SIZE);
 	BUILD_BUG_ON(sizeof(struct nvme_id_ns) != NVME_IDENTIFY_DATA_SIZE);
 	BUILD_BUG_ON(sizeof(struct nvme_lba_range_type) != 64);
+	BUILD_BUG_ON(sizeof(struct nvme_telemetry_log_page_hdr) != 512);
 	BUILD_BUG_ON(sizeof(struct nvme_smart_log) != 512);
 	BUILD_BUG_ON(sizeof(struct nvme_dbbuf) != 64);
 	BUILD_BUG_ON(sizeof(struct nvme_directive_cmd) != 64);
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 5ee75b5..7f6f1fc 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -195,6 +195,7 @@ struct nvme_ctrl {
 	u32 vs;
 	u32 sgls;
 	u16 kas;
+	u8 lpa;
 	u8 npss;
 	u8 apsta;
 	u32 oaes;
diff --git a/include/linux/nvme.h b/include/linux/nvme.h
index c40720c..8c0b29d 100644
--- a/include/linux/nvme.h
+++ b/include/linux/nvme.h
@@ -294,6 +294,8 @@ enum {
 	NVME_CTRL_OACS_DIRECTIVES		= 1 << 5,
 	NVME_CTRL_OACS_DBBUF_SUPP		= 1 << 8,
 	NVME_CTRL_LPA_CMD_EFFECTS_LOG		= 1 << 1,
+	NVME_CTRL_LPA_EXTENDED_DATA		= 1 << 2,
+	NVME_CTRL_LPA_TELEMETRY_LOG		= 1 << 3,
 };
 
 struct nvme_lbaf {
@@ -396,6 +398,20 @@ enum {
 	NVME_NIDT_UUID		= 0x03,
 };
 
+struct nvme_telemetry_log_page_hdr {
+	__u8    lpi; /* Log page identifier */
+	__u8    rsvd[4];
+	__u8    iee_oui[3];
+	__le16  dalb1; /* Data area 1 last block */
+	__le16  dalb2; /* Data area 2 last block */
+	__le16  dalb3; /* Data area 3 last block */
+	__u8    rsvd1[368];
+	__u8    ctrlavail; /* Controller initiated data avail?*/
+	__u8    ctrldgn; /* Controller initiated telemetry Data Gen # */
+	__u8    rsnident[128];
+	__u8    telemetry_dataarea[0];
+};
+
 struct nvme_smart_log {
 	__u8			critical_warning;
 	__u8			temperature[2];
@@ -832,6 +848,7 @@ enum {
 	NVME_LOG_FW_SLOT	= 0x03,
 	NVME_LOG_CHANGED_NS	= 0x04,
 	NVME_LOG_CMD_EFFECTS	= 0x05,
+	NVME_LOG_TELEMETRY_CTRL	= 0x08,
 	NVME_LOG_ANA		= 0x0c,
 	NVME_LOG_DISC		= 0x70,
 	NVME_LOG_RESERVATION	= 0x80,
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 5/7] nvme-pci: add device coredump infrastructure
  2019-05-12 15:54 [PATCH v3 0/7] nvme-pci: support device coredump Akinobu Mita
                   ` (3 preceding siblings ...)
  2019-05-12 15:54 ` [PATCH v3 4/7] nvme: add basic facility to get telemetry log page Akinobu Mita
@ 2019-05-12 15:54 ` Akinobu Mita
  2019-05-13 13:50   ` Keith Busch
  2019-05-13 14:02   ` Christoph Hellwig
  2019-05-12 15:54 ` [PATCH v3 6/7] nvme-pci: trigger device coredump on command timeout Akinobu Mita
                   ` (3 subsequent siblings)
  8 siblings, 2 replies; 22+ messages in thread
From: Akinobu Mita @ 2019-05-12 15:54 UTC (permalink / raw)
  To: linux-nvme, linux-kernel
  Cc: Akinobu Mita, Johannes Berg, Keith Busch, Jens Axboe,
	Christoph Hellwig, Sagi Grimberg, Minwoo Im, Kenneth Heitke

This provides three functions to implement the device coredump for nvme
driver.

nvme_coredump_init() -  This function is called when the driver determines
to start collecting device coredump.  The snapshots of the controller
registers, and admin and IO queues are captured by this.

nvme_coredump_logs() - This function is called as soon as the device is
recovered from the crash and admin queue becomes available.  If the device
coredump has already been started by nvme_coredump_init(), the telemetry
controller-initiated data will be collected.  Otherwise do nothing.

nvme_coredump_complete() - This functions is called when the driver
determines that there is nothing to collect device coredump anymore.
All collected coredumps are exported via device coredump mechanism.

After finishing the nvme device coredump, the following files are created.

- regs: NVMe controller registers (00h to 4Fh)
- sq<qid>: Submission queue
- cq<qid>: Completion queue
- telemetry-ctrl-log: Telemetry controller-initiated log (if available)
- data: Empty

The reason for an empty 'data' file is to provide a uniform way to notify
the device coredump is no longer needed by writing the 'data' file.

Since all existing drivers using the device coredump provide a 'data' file
if the nvme device coredump doesn't provide it, the userspace programs need
to know which driver provides what coredump file.

Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Minwoo Im <minwoo.im.dev@gmail.com>
Cc: Kenneth Heitke <kenneth.heitke@intel.com>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
---
* v3
- Fix typo s/machanism/mechanism/ in commit log
- Fix max transfer size calculation for get log page
- Add function comments
- Extract 'enable to trigger device coredump by hand' patch

 drivers/nvme/host/Kconfig |   1 +
 drivers/nvme/host/core.c  |   1 +
 drivers/nvme/host/pci.c   | 448 ++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 450 insertions(+)

diff --git a/drivers/nvme/host/Kconfig b/drivers/nvme/host/Kconfig
index 0f345e2..c3a06af 100644
--- a/drivers/nvme/host/Kconfig
+++ b/drivers/nvme/host/Kconfig
@@ -5,6 +5,7 @@ config BLK_DEV_NVME
 	tristate "NVM Express block device"
 	depends on PCI && BLOCK
 	select NVME_CORE
+	select WANT_DEV_COREDUMP
 	---help---
 	  The NVM Express driver is for solid state drives directly
 	  connected to the PCI or PCI Express bus.  If you know you
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 0cea2a8..172551b 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2462,6 +2462,7 @@ int nvme_get_log(struct nvme_ctrl *ctrl, u32 nsid, u8 log_page, u8 lsp,
 
 	return nvme_submit_sync_cmd(ctrl->admin_q, &c, log, size);
 }
+EXPORT_SYMBOL_GPL(nvme_get_log);
 
 static int nvme_get_effects_log(struct nvme_ctrl *ctrl)
 {
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 3e4fb89..3eebb98 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -9,6 +9,7 @@
 #include <linux/blkdev.h>
 #include <linux/blk-mq.h>
 #include <linux/blk-mq-pci.h>
+#include <linux/devcoredump.h>
 #include <linux/dmi.h>
 #include <linux/init.h>
 #include <linux/interrupt.h>
@@ -89,6 +90,10 @@ struct nvme_queue;
 static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown);
 static bool __nvme_disable_io_queues(struct nvme_dev *dev, u8 opcode);
 
+static void __maybe_unused nvme_coredump_init(struct nvme_dev *dev);
+static void __maybe_unused nvme_coredump_logs(struct nvme_dev *dev);
+static void __maybe_unused nvme_coredump_complete(struct nvme_dev *dev);
+
 /*
  * Represents an NVM Express device.  Each nvme_dev is a PCI function.
  */
@@ -131,6 +136,9 @@ struct nvme_dev {
 	dma_addr_t host_mem_descs_dma;
 	struct nvme_host_mem_buf_desc *host_mem_descs;
 	void **host_mem_desc_bufs;
+
+	struct dev_coredumpm_bulk_data *dumps;
+	int num_dumps;
 };
 
 static int io_queue_depth_set(const char *val, const struct kernel_param *kp)
@@ -2849,6 +2857,446 @@ static int nvme_resume(struct device *dev)
 
 static SIMPLE_DEV_PM_OPS(nvme_dev_pm_ops, nvme_suspend, nvme_resume);
 
+#ifdef CONFIG_DEV_COREDUMP
+
+static ssize_t nvme_coredump_read(char *buffer, loff_t offset, size_t count,
+				  void *data, size_t datalen)
+{
+	return memory_read_from_buffer(buffer, count, &offset, data, datalen);
+}
+
+static void nvme_coredump_free(void *data)
+{
+	kvfree(data);
+}
+
+static int nvme_coredump_empty(struct dev_coredumpm_bulk_data *data)
+{
+	data->name = kstrdup("data", GFP_KERNEL);
+	if (!data->name)
+		return -ENOMEM;
+
+	data->data = NULL;
+	data->datalen = 0;
+	data->read = nvme_coredump_read;
+	data->free = nvme_coredump_free;
+
+	return 0;
+}
+
+static int nvme_coredump_regs(struct dev_coredumpm_bulk_data *data,
+			      struct nvme_ctrl *ctrl)
+{
+	const int reg_size = 0x50; /* 00h to 4Fh */
+
+	data->name = kstrdup("regs", GFP_KERNEL);
+	if (!data->name)
+		return -ENOMEM;
+
+	data->data = kvzalloc(reg_size, GFP_KERNEL);
+	if (!data->data) {
+		kfree(data->name);
+		return -ENOMEM;
+	}
+	memcpy_fromio(data->data, to_nvme_dev(ctrl)->bar, reg_size);
+
+	data->datalen = reg_size;
+	data->read = nvme_coredump_read;
+	data->free = nvme_coredump_free;
+
+	return 0;
+}
+
+static void *kvmemdup(const void *src, size_t len, gfp_t gfp)
+{
+	void *p;
+
+	p = kvmalloc(len, gfp);
+	if (p)
+		memcpy(p, src, len);
+
+	return p;
+}
+
+static int nvme_coredump_queues(struct dev_coredumpm_bulk_data *bulk_data,
+				struct nvme_ctrl *ctrl)
+{
+	int i;
+
+	for (i = 0; i < ctrl->queue_count; i++) {
+		struct dev_coredumpm_bulk_data *data = &bulk_data[2 * i];
+		struct nvme_queue *nvmeq = &to_nvme_dev(ctrl)->queues[i];
+
+		data[0].name = kasprintf(GFP_KERNEL, "sq%d", i);
+		data[0].data = kvmemdup(nvmeq->sq_cmds,
+					SQ_SIZE(nvmeq->q_depth), GFP_KERNEL);
+		data[0].datalen = SQ_SIZE(nvmeq->q_depth);
+		data[0].read = nvme_coredump_read;
+		data[0].free = nvme_coredump_free;
+
+		data[1].name = kasprintf(GFP_KERNEL, "cq%d", i);
+		data[1].data = kvmemdup((void *)nvmeq->cqes,
+					CQ_SIZE(nvmeq->q_depth), GFP_KERNEL);
+		data[1].datalen = CQ_SIZE(nvmeq->q_depth);
+		data[1].read = nvme_coredump_read;
+		data[1].free = nvme_coredump_free;
+
+		if (!data[0].name || !data[1].name ||
+		    !data[0].data || !data[1].data)
+			goto free;
+	}
+
+	return 0;
+free:
+	for (; i >= 0; i--) {
+		struct dev_coredumpm_bulk_data *data = &bulk_data[2 * i];
+
+		kfree(data[0].name);
+		kfree(data[1].name);
+		kvfree(data[0].data);
+		kvfree(data[1].data);
+	}
+
+	return -ENOMEM;
+}
+
+static struct
+dev_coredumpm_bulk_data *nvme_coredump_alloc(struct nvme_dev *dev, int n)
+{
+	struct dev_coredumpm_bulk_data *data;
+
+	data = krealloc(dev->dumps, sizeof(*data) * (dev->num_dumps + n),
+			GFP_KERNEL | __GFP_ZERO);
+	if (!data)
+		return NULL;
+
+	dev->dumps = data;
+	data += dev->num_dumps;
+	dev->num_dumps += n;
+
+	return data;
+}
+
+static void __nvme_coredump_discard(struct nvme_dev *dev, bool free_data)
+{
+	int i;
+
+	for (i = 0; i < dev->num_dumps; i++) {
+		kfree(dev->dumps[i].name);
+		if (free_data)
+			dev->dumps[i].free(dev->dumps[i].data);
+	}
+
+	kfree(dev->dumps);
+	dev->dumps = NULL;
+	dev->num_dumps = 0;
+}
+
+static void nvme_coredump_discard(struct nvme_dev *dev)
+{
+	__nvme_coredump_discard(dev, true);
+}
+
+static void nvme_coredump_clear(struct nvme_dev *dev)
+{
+	__nvme_coredump_discard(dev, false);
+}
+
+/**
+ * nvme_coredump_init() - start collecting device coredump
+ * @dev: the struct nvme_dev for the crashed device
+ *
+ * This function is called when the driver determines to start collecting
+ * device coredump.  The snapshots of the controller registers, and admin and
+ * IO queues are captured by this.
+ */
+static void nvme_coredump_init(struct nvme_dev *dev)
+{
+	struct nvme_ctrl *ctrl = &dev->ctrl;
+	struct dev_coredumpm_bulk_data *bulk_data;
+	int ret;
+
+	if (WARN_ON(dev->dumps))
+		nvme_coredump_discard(dev);
+
+	bulk_data = nvme_coredump_alloc(dev, 2 + 2 * ctrl->queue_count);
+	if (!bulk_data)
+		return;
+
+	ret = nvme_coredump_empty(&bulk_data[0]);
+	if (ret)
+		goto free_bulk_data;
+
+	ret = nvme_coredump_regs(&bulk_data[1], ctrl);
+	if (ret)
+		goto free_bulk_data;
+
+	ret = nvme_coredump_queues(&bulk_data[2], ctrl);
+	if (ret)
+		goto free_bulk_data;
+
+	return;
+
+free_bulk_data:
+	nvme_coredump_discard(dev);
+}
+
+static ssize_t nvme_coredump_read_sgtable(char *buffer, loff_t offset,
+					  size_t buf_len, void *data,
+					  size_t data_len)
+{
+	struct sg_table *table = data;
+
+	if (data_len <= offset)
+		return 0;
+
+	if (offset + buf_len > data_len)
+		buf_len = data_len - offset;
+
+	return sg_pcopy_to_buffer(table->sgl, sg_nents(table->sgl), buffer,
+				  buf_len, offset);
+}
+
+static void nvme_coredump_free_sgtable(void *data)
+{
+	struct sg_table *table = data;
+	struct scatterlist *sg;
+	int n = sg_nents(table->sgl);
+	int i;
+
+	for_each_sg(table->sgl, sg, n, i) {
+		struct page *page = sg_page(sg);
+
+		if (page)
+			__free_page(page);
+
+	}
+
+	kfree(table);
+}
+
+static struct sg_table *nvme_coredump_alloc_sgtable(size_t bytes)
+{
+	struct sg_table *table;
+	struct scatterlist *sg;
+	int n = DIV_ROUND_UP(bytes, PAGE_SIZE);
+	int i;
+
+	table = kzalloc(sizeof(*table), GFP_KERNEL);
+	if (!table)
+		return NULL;
+
+	if (sg_alloc_table(table, n, GFP_KERNEL))
+		goto free_table;
+
+	for_each_sg(table->sgl, sg, n, i) {
+		struct page *page = alloc_page(GFP_KERNEL);
+		size_t size = min_t(int, bytes, PAGE_SIZE);
+
+		if (!page)
+			goto free_page;
+
+		sg_set_page(sg, page, size, 0);
+		bytes -= size;
+	}
+
+	return table;
+free_page:
+	for_each_sg(table->sgl, sg, n, i) {
+		struct page *page = sg_page(sg);
+
+		if (page)
+			__free_page(page);
+
+	}
+free_table:
+	kfree(table);
+
+	return NULL;
+}
+
+static int nvme_get_telemetry_log_blocks(struct nvme_ctrl *ctrl, void *buf,
+					 size_t bytes, loff_t offset)
+{
+	loff_t pos = 0;
+	u32 chunk_size;
+
+	if (check_mul_overflow(ctrl->max_hw_sectors, 512u, &chunk_size))
+		chunk_size = UINT_MAX;
+
+	while (pos < bytes) {
+		size_t size = min_t(size_t, bytes - pos, chunk_size);
+		int ret;
+
+		ret = nvme_get_log(ctrl, NVME_NSID_ALL, NVME_LOG_TELEMETRY_CTRL,
+				   0, buf + pos, size, offset + pos);
+		if (ret)
+			return ret;
+
+		pos += size;
+	}
+
+	return 0;
+}
+
+static int nvme_get_telemetry_log(struct nvme_ctrl *ctrl,
+				  struct sg_table *table, size_t bytes)
+{
+	int n = sg_nents(table->sgl);
+	struct scatterlist *sg;
+	size_t offset = 0;
+	int i;
+
+	for_each_sg(table->sgl, sg, n, i) {
+		struct page *page = sg_page(sg);
+		size_t size = min_t(int, bytes - offset, sg->length);
+		int ret;
+
+		ret = nvme_get_telemetry_log_blocks(ctrl, page_address(page),
+						    size, offset);
+		if (ret)
+			return ret;
+
+		offset += size;
+	}
+
+	return 0;
+}
+
+static int nvme_coredump_telemetry_log(struct dev_coredumpm_bulk_data *data,
+				       struct nvme_ctrl *ctrl)
+{
+	struct nvme_telemetry_log_page_hdr *page_hdr;
+	struct sg_table *table;
+	int log_size;
+	int ret;
+	u8 ctrldgn;
+
+	if (!(ctrl->lpa & NVME_CTRL_LPA_TELEMETRY_LOG))
+		return -EINVAL;
+	if (!(ctrl->lpa & NVME_CTRL_LPA_EXTENDED_DATA))
+		return -EINVAL;
+
+	page_hdr = kzalloc(sizeof(*page_hdr), GFP_KERNEL);
+	if (!page_hdr)
+		return -ENOMEM;
+
+	ret = nvme_get_log(ctrl, NVME_NSID_ALL, NVME_LOG_TELEMETRY_CTRL, 0,
+			   page_hdr, sizeof(*page_hdr), 0);
+	if (ret)
+		goto free_page_hdr;
+
+	if (!page_hdr->ctrlavail) {
+		ret = -EINVAL;
+		goto free_page_hdr;
+	}
+
+	log_size = (le16_to_cpu(page_hdr->dalb3) + 1) * 512;
+
+	table = nvme_coredump_alloc_sgtable(log_size);
+	if (!table) {
+		ret = -ENOMEM;
+		goto free_page_hdr;
+	}
+
+	ret = nvme_get_telemetry_log(ctrl, table, log_size);
+	if (ret)
+		goto free_table;
+
+	sg_pcopy_to_buffer(table->sgl, sg_nents(table->sgl), &ctrldgn,
+			   sizeof(ctrldgn),
+			   offsetof(typeof(*page_hdr), ctrldgn));
+
+	ret = nvme_get_log(ctrl, NVME_NSID_ALL, NVME_LOG_TELEMETRY_CTRL, 0,
+			   page_hdr, sizeof(*page_hdr), 0);
+	if (ret)
+		goto free_table;
+
+	if (page_hdr->ctrldgn != ctrldgn) {
+		ret = -EINVAL;
+		goto free_table;
+	}
+
+	data->name = kstrdup("telemetry-ctrl-log", GFP_KERNEL);
+	if (!data->name) {
+		ret = -ENOMEM;
+		goto free_table;
+	}
+
+	data->data = table;
+	data->datalen = log_size;
+	data->read = nvme_coredump_read_sgtable;
+	data->free = nvme_coredump_free_sgtable;
+
+	kfree(page_hdr);
+
+	return 0;
+free_table:
+	nvme_coredump_free_sgtable(table);
+free_page_hdr:
+	kfree(page_hdr);
+
+	return ret;
+}
+
+/**
+ * nvme_coredump_logs() - get telemetry log if available
+ * @dev: the struct nvme_dev for the crashed device
+ *
+ * This function is called as soon as the device is recovered from the crash
+ * and admin queue becomes available.  If the device coredump has already been
+ * started by nvme_coredump_init(), the telemetry controller-initiated data
+ * will be collected.  Otherwise do nothing.
+ */
+static void nvme_coredump_logs(struct nvme_dev *dev)
+{
+	struct dev_coredumpm_bulk_data *bulk_data;
+
+	if (!dev->dumps)
+		return;
+
+	bulk_data = nvme_coredump_alloc(dev, 1);
+	if (!bulk_data)
+		return;
+
+	if (nvme_coredump_telemetry_log(bulk_data, &dev->ctrl))
+		dev->num_dumps--;
+}
+
+/**
+ * nvme_coredump_complete() - finish device coredump
+ * @dev: the struct nvme_dev for the crashed device
+ *
+ * This functions is called when the driver determines that there is nothing
+ * to collect device coredump anymore.  All collected coredumps are exported
+ * via device coredump mechanism.
+ */
+static void nvme_coredump_complete(struct nvme_dev *dev)
+{
+	if (!dev->dumps)
+		return;
+
+	dev_coredumpm_bulk(dev->dev, THIS_MODULE, GFP_KERNEL,
+			   dev->dumps, dev->num_dumps);
+	nvme_coredump_clear(dev);
+}
+
+#else
+
+static void nvme_coredump_init(struct nvme_dev *dev)
+{
+}
+
+static void nvme_coredump_logs(struct nvme_dev *dev)
+{
+}
+
+static void nvme_coredump_complete(struct nvme_dev *dev)
+{
+}
+
+#endif /* CONFIG_DEV_COREDUMP */
+
 static pci_ers_result_t nvme_error_detected(struct pci_dev *pdev,
 						pci_channel_state_t state)
 {
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 6/7] nvme-pci: trigger device coredump on command timeout
  2019-05-12 15:54 [PATCH v3 0/7] nvme-pci: support device coredump Akinobu Mita
                   ` (4 preceding siblings ...)
  2019-05-12 15:54 ` [PATCH v3 5/7] nvme-pci: add device coredump infrastructure Akinobu Mita
@ 2019-05-12 15:54 ` Akinobu Mita
  2019-05-13 13:52   ` Keith Busch
  2019-05-12 15:54 ` [PATCH v3 7/7] nvme-pci: enable to trigger device coredump by hand Akinobu Mita
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 22+ messages in thread
From: Akinobu Mita @ 2019-05-12 15:54 UTC (permalink / raw)
  To: linux-nvme, linux-kernel
  Cc: Akinobu Mita, Johannes Berg, Keith Busch, Jens Axboe,
	Christoph Hellwig, Sagi Grimberg, Minwoo Im, Kenneth Heitke

This enables the nvme driver to trigger a device coredump when command
timeout occurs, and it helps diagnose and debug issues.

This can be tested with fail_io_timeout fault injection.

	# echo 1 > /sys/kernel/debug/fail_io_timeout/probability
	# echo 1 > /sys/kernel/debug/fail_io_timeout/times
	# echo 1 > /sys/block/nvme0n1/io-timeout-fail
	# dd if=/dev/nvme0n1 of=/dev/null

Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Minwoo Im <minwoo.im.dev@gmail.com>
Cc: Kenneth Heitke <kenneth.heitke@intel.com>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
---
* v3
- Don't try to get telemetry log when admin queue is not available

 drivers/nvme/host/pci.c | 39 +++++++++++++++++++++++----------------
 1 file changed, 23 insertions(+), 16 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 3eebb98..6522592 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -87,12 +87,12 @@ MODULE_PARM_DESC(poll_queues, "Number of queues to use for polled IO.");
 struct nvme_dev;
 struct nvme_queue;
 
-static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown);
+static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown, bool dump);
 static bool __nvme_disable_io_queues(struct nvme_dev *dev, u8 opcode);
 
-static void __maybe_unused nvme_coredump_init(struct nvme_dev *dev);
-static void __maybe_unused nvme_coredump_logs(struct nvme_dev *dev);
-static void __maybe_unused nvme_coredump_complete(struct nvme_dev *dev);
+static void nvme_coredump_init(struct nvme_dev *dev);
+static void nvme_coredump_logs(struct nvme_dev *dev);
+static void nvme_coredump_complete(struct nvme_dev *dev);
 
 /*
  * Represents an NVM Express device.  Each nvme_dev is a PCI function.
@@ -1280,7 +1280,7 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved)
 	 */
 	if (nvme_should_reset(dev, csts)) {
 		nvme_warn_reset(dev, csts);
-		nvme_dev_disable(dev, false);
+		nvme_dev_disable(dev, false, true);
 		nvme_reset_ctrl(&dev->ctrl);
 		return BLK_EH_DONE;
 	}
@@ -1309,7 +1309,7 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved)
 		dev_warn_ratelimited(dev->ctrl.device,
 			 "I/O %d QID %d timeout, disable controller\n",
 			 req->tag, nvmeq->qid);
-		nvme_dev_disable(dev, shutdown);
+		nvme_dev_disable(dev, shutdown, true);
 		nvme_req(req)->flags |= NVME_REQ_CANCELLED;
 		return BLK_EH_DONE;
 	default:
@@ -1325,7 +1325,7 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved)
 		dev_warn(dev->ctrl.device,
 			 "I/O %d QID %d timeout, reset controller\n",
 			 req->tag, nvmeq->qid);
-		nvme_dev_disable(dev, false);
+		nvme_dev_disable(dev, false, true);
 		nvme_reset_ctrl(&dev->ctrl);
 
 		nvme_req(req)->flags |= NVME_REQ_CANCELLED;
@@ -2382,7 +2382,7 @@ static void nvme_pci_disable(struct nvme_dev *dev)
 	}
 }
 
-static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
+static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown, bool dump)
 {
 	bool dead = true;
 	struct pci_dev *pdev = to_pci_dev(dev->dev);
@@ -2407,6 +2407,9 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
 			nvme_wait_freeze_timeout(&dev->ctrl, NVME_IO_TIMEOUT);
 	}
 
+	if (dump)
+		nvme_coredump_init(dev);
+
 	nvme_stop_queues(&dev->ctrl);
 
 	if (!dead && dev->ctrl.queue_count > 0) {
@@ -2477,7 +2480,7 @@ static void nvme_remove_dead_ctrl(struct nvme_dev *dev, int status)
 	dev_warn(dev->ctrl.device, "Removing after probe failure status: %d\n", status);
 
 	nvme_get_ctrl(&dev->ctrl);
-	nvme_dev_disable(dev, false);
+	nvme_dev_disable(dev, false, false);
 	nvme_kill_queues(&dev->ctrl);
 	if (!queue_work(nvme_wq, &dev->remove_work))
 		nvme_put_ctrl(&dev->ctrl);
@@ -2499,7 +2502,7 @@ static void nvme_reset_work(struct work_struct *work)
 	 * moving on.
 	 */
 	if (dev->ctrl.ctrl_config & NVME_CC_ENABLE)
-		nvme_dev_disable(dev, false);
+		nvme_dev_disable(dev, false, false);
 
 	mutex_lock(&dev->shutdown_lock);
 	result = nvme_pci_enable(dev);
@@ -2536,6 +2539,9 @@ static void nvme_reset_work(struct work_struct *work)
 	if (result)
 		goto out;
 
+	nvme_coredump_logs(dev);
+	nvme_coredump_complete(dev);
+
 	if (dev->ctrl.oacs & NVME_CTRL_OACS_SEC_SUPP) {
 		if (!dev->ctrl.opal_dev)
 			dev->ctrl.opal_dev =
@@ -2598,6 +2604,7 @@ static void nvme_reset_work(struct work_struct *work)
  out_unlock:
 	mutex_unlock(&dev->shutdown_lock);
  out:
+	nvme_coredump_complete(dev);
 	nvme_remove_dead_ctrl(dev, result);
 }
 
@@ -2788,7 +2795,7 @@ static int nvme_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 static void nvme_reset_prepare(struct pci_dev *pdev)
 {
 	struct nvme_dev *dev = pci_get_drvdata(pdev);
-	nvme_dev_disable(dev, false);
+	nvme_dev_disable(dev, false, false);
 }
 
 static void nvme_reset_done(struct pci_dev *pdev)
@@ -2800,7 +2807,7 @@ static void nvme_reset_done(struct pci_dev *pdev)
 static void nvme_shutdown(struct pci_dev *pdev)
 {
 	struct nvme_dev *dev = pci_get_drvdata(pdev);
-	nvme_dev_disable(dev, true);
+	nvme_dev_disable(dev, true, false);
 }
 
 /*
@@ -2817,14 +2824,14 @@ static void nvme_remove(struct pci_dev *pdev)
 
 	if (!pci_device_is_present(pdev)) {
 		nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_DEAD);
-		nvme_dev_disable(dev, true);
+		nvme_dev_disable(dev, true, false);
 		nvme_dev_remove_admin(dev);
 	}
 
 	flush_work(&dev->ctrl.reset_work);
 	nvme_stop_ctrl(&dev->ctrl);
 	nvme_remove_namespaces(&dev->ctrl);
-	nvme_dev_disable(dev, true);
+	nvme_dev_disable(dev, true, false);
 	nvme_release_cmb(dev);
 	nvme_free_host_mem(dev);
 	nvme_dev_remove_admin(dev);
@@ -2841,7 +2848,7 @@ static int nvme_suspend(struct device *dev)
 	struct pci_dev *pdev = to_pci_dev(dev);
 	struct nvme_dev *ndev = pci_get_drvdata(pdev);
 
-	nvme_dev_disable(ndev, true);
+	nvme_dev_disable(ndev, true, false);
 	return 0;
 }
 
@@ -3313,7 +3320,7 @@ static pci_ers_result_t nvme_error_detected(struct pci_dev *pdev,
 	case pci_channel_io_frozen:
 		dev_warn(dev->ctrl.device,
 			"frozen state error detected, reset controller\n");
-		nvme_dev_disable(dev, false);
+		nvme_dev_disable(dev, false, false);
 		return PCI_ERS_RESULT_NEED_RESET;
 	case pci_channel_io_perm_failure:
 		dev_warn(dev->ctrl.device,
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 7/7] nvme-pci: enable to trigger device coredump by hand
  2019-05-12 15:54 [PATCH v3 0/7] nvme-pci: support device coredump Akinobu Mita
                   ` (5 preceding siblings ...)
  2019-05-12 15:54 ` [PATCH v3 6/7] nvme-pci: trigger device coredump on command timeout Akinobu Mita
@ 2019-05-12 15:54 ` Akinobu Mita
       [not found] ` <CGME20190512155533epcas4p110edff15ebf5b2efae32e43f0f10ab59@epcms2p5>
       [not found] ` <CGME20190512155540epcas4p14c15eb86b08dcd281e9a93a4fc190800@epcms2p1>
  8 siblings, 0 replies; 22+ messages in thread
From: Akinobu Mita @ 2019-05-12 15:54 UTC (permalink / raw)
  To: linux-nvme, linux-kernel
  Cc: Akinobu Mita, Johannes Berg, Keith Busch, Jens Axboe,
	Christoph Hellwig, Sagi Grimberg, Minwoo Im, Kenneth Heitke

This provides a way to trigger the nvme device coredump by writing
anything to /sys/devices/.../coredump attribute.

Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Minwoo Im <minwoo.im.dev@gmail.com>
Cc: Kenneth Heitke <kenneth.heitke@intel.com>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
---
* v3
- Extracted from 'add device coredump infrastructure' patch
- Avoid deadlock in .coredump callback

 drivers/nvme/host/pci.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 6522592..fad5395 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -3288,6 +3288,14 @@ static void nvme_coredump_complete(struct nvme_dev *dev)
 	nvme_coredump_clear(dev);
 }
 
+static void nvme_coredump(struct device *dev)
+{
+	struct nvme_dev *ndev = dev_get_drvdata(dev);
+
+	nvme_dev_disable(ndev, false, true);
+	nvme_reset_ctrl_sync(&ndev->ctrl);
+}
+
 #else
 
 static void nvme_coredump_init(struct nvme_dev *dev)
@@ -3302,6 +3310,10 @@ static void nvme_coredump_complete(struct nvme_dev *dev)
 {
 }
 
+static void nvme_coredump(struct device *dev)
+{
+}
+
 #endif /* CONFIG_DEV_COREDUMP */
 
 static pci_ers_result_t nvme_error_detected(struct pci_dev *pdev,
@@ -3409,6 +3421,7 @@ static struct pci_driver nvme_driver = {
 	.shutdown	= nvme_shutdown,
 	.driver		= {
 		.pm	= &nvme_dev_pm_ops,
+		.coredump = nvme_coredump,
 	},
 	.sriov_configure = pci_sriov_configure_simple,
 	.err_handler	= &nvme_err_handler,
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 6/7] nvme-pci: trigger device coredump on command timeout
       [not found] ` <CGME20190512155533epcas4p110edff15ebf5b2efae32e43f0f10ab59@epcms2p5>
@ 2019-05-13  7:41   ` Minwoo Im
  2019-05-13 14:59     ` Akinobu Mita
  0 siblings, 1 reply; 22+ messages in thread
From: Minwoo Im @ 2019-05-13  7:41 UTC (permalink / raw)
  To: linux-nvme, linux-kernel, Minwoo Im
  Cc: Jens Axboe, Sagi Grimberg, Kenneth Heitke, Akinobu Mita,
	Keith Busch, Minwoo Im, Johannes Berg, Christoph Hellwig

> -static void __maybe_unused nvme_coredump_init(struct nvme_dev *dev);
> -static void __maybe_unused nvme_coredump_logs(struct nvme_dev *dev);
> -static void __maybe_unused nvme_coredump_complete(struct nvme_dev
> *dev);
> +static void nvme_coredump_init(struct nvme_dev *dev);
> +static void nvme_coredump_logs(struct nvme_dev *dev);
> +static void nvme_coredump_complete(struct nvme_dev *dev);

You just have added those three prototypes in previous patch.  Did I miss
something here?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 5/7] nvme-pci: add device coredump infrastructure
       [not found] ` <CGME20190512155540epcas4p14c15eb86b08dcd281e9a93a4fc190800@epcms2p1>
@ 2019-05-13  7:46   ` Minwoo Im
  2019-05-13 15:23     ` Chaitanya Kulkarni
  0 siblings, 1 reply; 22+ messages in thread
From: Minwoo Im @ 2019-05-13  7:46 UTC (permalink / raw)
  To: linux-nvme, linux-kernel, Minwoo Im
  Cc: Jens Axboe, Sagi Grimberg, Kenneth Heitke, Akinobu Mita,
	Keith Busch, Minwoo Im, Johannes Berg, Christoph Hellwig

> +static int nvme_get_telemetry_log_blocks(struct nvme_ctrl *ctrl, void *buf,
> +					 size_t bytes, loff_t offset)
> +{
> +	loff_t pos = 0;
> +	u32 chunk_size;
> +
> +	if (check_mul_overflow(ctrl->max_hw_sectors, 512u, &chunk_size))
> +		chunk_size = UINT_MAX;
> +
> +	while (pos < bytes) {
> +		size_t size = min_t(size_t, bytes - pos, chunk_size);
> +		int ret;
> +
> +		ret = nvme_get_log(ctrl, NVME_NSID_ALL,
> NVME_LOG_TELEMETRY_CTRL,
> +				   0, buf + pos, size, offset + pos);
> +		if (ret)
> +			return ret;
> +
> +		pos += size;
> +	}
> +
> +	return 0;
> +}
> +
> +static int nvme_get_telemetry_log(struct nvme_ctrl *ctrl,
> +				  struct sg_table *table, size_t bytes)
> +{
> +	int n = sg_nents(table->sgl);
> +	struct scatterlist *sg;
> +	size_t offset = 0;
> +	int i;
> +
> +	for_each_sg(table->sgl, sg, n, i) {
> +		struct page *page = sg_page(sg);
> +		size_t size = min_t(int, bytes - offset, sg->length);
> +		int ret;
> +
> +		ret = nvme_get_telemetry_log_blocks(ctrl,
> page_address(page),
> +						    size, offset);
> +		if (ret)
> +			return ret;
> +
> +		offset += size;
> +	}
> +
> +	return 0;
> +}

Can we have those two in nvme-core module instead of being in pci module?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 5/7] nvme-pci: add device coredump infrastructure
  2019-05-12 15:54 ` [PATCH v3 5/7] nvme-pci: add device coredump infrastructure Akinobu Mita
@ 2019-05-13 13:50   ` Keith Busch
  2019-05-13 15:01     ` Akinobu Mita
  2019-05-13 14:02   ` Christoph Hellwig
  1 sibling, 1 reply; 22+ messages in thread
From: Keith Busch @ 2019-05-13 13:50 UTC (permalink / raw)
  To: Akinobu Mita
  Cc: linux-nvme, linux-kernel, Johannes Berg, Busch, Keith,
	Jens Axboe, Christoph Hellwig, Sagi Grimberg, Minwoo Im, Heitke,
	Kenneth

On Sun, May 12, 2019 at 08:54:15AM -0700, Akinobu Mita wrote:
> +static void nvme_coredump_logs(struct nvme_dev *dev)
> +{
> +	struct dev_coredumpm_bulk_data *bulk_data;
> +
> +	if (!dev->dumps)
> +		return;
> +
> +	bulk_data = nvme_coredump_alloc(dev, 1);
> +	if (!bulk_data)
> +		return;
> +
> +	if (nvme_coredump_telemetry_log(bulk_data, &dev->ctrl))
> +		dev->num_dumps--;
> +}

You'll need this function to return the same 'int' value from
nvme_coredump_telemetry_log. A negative value here means that the
device didn't produce a response, and that's important to check from
the reset work since you'll need to abort the reset if that happens.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 6/7] nvme-pci: trigger device coredump on command timeout
  2019-05-12 15:54 ` [PATCH v3 6/7] nvme-pci: trigger device coredump on command timeout Akinobu Mita
@ 2019-05-13 13:52   ` Keith Busch
  0 siblings, 0 replies; 22+ messages in thread
From: Keith Busch @ 2019-05-13 13:52 UTC (permalink / raw)
  To: Akinobu Mita
  Cc: linux-nvme, linux-kernel, Johannes Berg, Busch, Keith,
	Jens Axboe, Christoph Hellwig, Sagi Grimberg, Minwoo Im, Heitke,
	Kenneth

On Sun, May 12, 2019 at 08:54:16AM -0700, Akinobu Mita wrote:
> @@ -2536,6 +2539,9 @@ static void nvme_reset_work(struct work_struct *work)
>  	if (result)
>  		goto out;
>  
> +	nvme_coredump_logs(dev);

If you change nvme_coredump_logs to return an int, check it here for < 0
and abandon the reset if true.

> +	nvme_coredump_complete(dev);
> +
>  	if (dev->ctrl.oacs & NVME_CTRL_OACS_SEC_SUPP) {
>  		if (!dev->ctrl.opal_dev)
>  			dev->ctrl.opal_dev =

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 5/7] nvme-pci: add device coredump infrastructure
  2019-05-12 15:54 ` [PATCH v3 5/7] nvme-pci: add device coredump infrastructure Akinobu Mita
  2019-05-13 13:50   ` Keith Busch
@ 2019-05-13 14:02   ` Christoph Hellwig
  2019-05-13 15:01     ` Akinobu Mita
  1 sibling, 1 reply; 22+ messages in thread
From: Christoph Hellwig @ 2019-05-13 14:02 UTC (permalink / raw)
  To: Akinobu Mita
  Cc: linux-nvme, linux-kernel, Johannes Berg, Keith Busch, Jens Axboe,
	Christoph Hellwig, Sagi Grimberg, Minwoo Im, Kenneth Heitke

Usage of a scatterlist here is rather bogus as we never use
it for dma mapping.  Why can't you store the various pages in a
large bio_vec and then just issue that to the device in one
get log page command?  (or at least a few if MDTS kicks in?)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 6/7] nvme-pci: trigger device coredump on command timeout
  2019-05-13  7:41   ` [PATCH v3 6/7] nvme-pci: trigger device coredump on command timeout Minwoo Im
@ 2019-05-13 14:59     ` Akinobu Mita
  0 siblings, 0 replies; 22+ messages in thread
From: Akinobu Mita @ 2019-05-13 14:59 UTC (permalink / raw)
  To: minwoo.im
  Cc: linux-nvme, linux-kernel, Jens Axboe, Sagi Grimberg,
	Kenneth Heitke, Keith Busch, Minwoo Im, Johannes Berg,
	Christoph Hellwig

2019年5月13日(月) 16:41 Minwoo Im <minwoo.im@samsung.com>:
>
> > -static void __maybe_unused nvme_coredump_init(struct nvme_dev *dev);
> > -static void __maybe_unused nvme_coredump_logs(struct nvme_dev *dev);
> > -static void __maybe_unused nvme_coredump_complete(struct nvme_dev
> > *dev);
> > +static void nvme_coredump_init(struct nvme_dev *dev);
> > +static void nvme_coredump_logs(struct nvme_dev *dev);
> > +static void nvme_coredump_complete(struct nvme_dev *dev);
>
> You just have added those three prototypes in previous patch.  Did I miss
> something here?

These __maybe_unused are needed only in the patch 5/7.
Because these functions are still unused before applying patch 6/7.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 5/7] nvme-pci: add device coredump infrastructure
  2019-05-13 13:50   ` Keith Busch
@ 2019-05-13 15:01     ` Akinobu Mita
  0 siblings, 0 replies; 22+ messages in thread
From: Akinobu Mita @ 2019-05-13 15:01 UTC (permalink / raw)
  To: Keith Busch
  Cc: linux-nvme, linux-kernel, Johannes Berg, Busch, Keith,
	Jens Axboe, Christoph Hellwig, Sagi Grimberg, Minwoo Im, Heitke,
	Kenneth

2019年5月13日(月) 22:55 Keith Busch <kbusch@kernel.org>:
>
> On Sun, May 12, 2019 at 08:54:15AM -0700, Akinobu Mita wrote:
> > +static void nvme_coredump_logs(struct nvme_dev *dev)
> > +{
> > +     struct dev_coredumpm_bulk_data *bulk_data;
> > +
> > +     if (!dev->dumps)
> > +             return;
> > +
> > +     bulk_data = nvme_coredump_alloc(dev, 1);
> > +     if (!bulk_data)
> > +             return;
> > +
> > +     if (nvme_coredump_telemetry_log(bulk_data, &dev->ctrl))
> > +             dev->num_dumps--;
> > +}
>
> You'll need this function to return the same 'int' value from
> nvme_coredump_telemetry_log. A negative value here means that the
> device didn't produce a response, and that's important to check from
> the reset work since you'll need to abort the reset if that happens.

OK.  Make sense.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 5/7] nvme-pci: add device coredump infrastructure
  2019-05-13 14:02   ` Christoph Hellwig
@ 2019-05-13 15:01     ` Akinobu Mita
  0 siblings, 0 replies; 22+ messages in thread
From: Akinobu Mita @ 2019-05-13 15:01 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-nvme, LKML, Johannes Berg, Keith Busch, Jens Axboe,
	Sagi Grimberg, Minwoo Im, Kenneth Heitke

2019年5月13日(月) 23:03 Christoph Hellwig <hch@lst.de>:
>
> Usage of a scatterlist here is rather bogus as we never use
> it for dma mapping.  Why can't you store the various pages in a
> large bio_vec and then just issue that to the device in one
> get log page command?  (or at least a few if MDTS kicks in?)

OK.  I'll try to use bio_vec and see how it goes.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 5/7] nvme-pci: add device coredump infrastructure
  2019-05-13  7:46   ` [PATCH v3 5/7] nvme-pci: add device coredump infrastructure Minwoo Im
@ 2019-05-13 15:23     ` Chaitanya Kulkarni
  2019-05-14 14:06       ` Akinobu Mita
  0 siblings, 1 reply; 22+ messages in thread
From: Chaitanya Kulkarni @ 2019-05-13 15:23 UTC (permalink / raw)
  To: minwoo.im, linux-nvme, linux-kernel
  Cc: Keith Busch, Sagi Grimberg, Kenneth Heitke, Akinobu Mita,
	Jens Axboe, Minwoo Im, Johannes Berg, Christoph Hellwig

On 05/13/2019 12:46 AM, Minwoo Im wrote:
>> +static int nvme_get_telemetry_log_blocks(struct nvme_ctrl *ctrl, void *buf,
>> +					 size_t bytes, loff_t offset)
>> +{
>> +	loff_t pos = 0;
>> +	u32 chunk_size;
>> +
>> +	if (check_mul_overflow(ctrl->max_hw_sectors, 512u, &chunk_size))
>> +		chunk_size = UINT_MAX;
>> +
>> +	while (pos < bytes) {
>> +		size_t size = min_t(size_t, bytes - pos, chunk_size);
>> +		int ret;
>> +
>> +		ret = nvme_get_log(ctrl, NVME_NSID_ALL,
>> NVME_LOG_TELEMETRY_CTRL,
>> +				   0, buf + pos, size, offset + pos);
>> +		if (ret)
>> +			return ret;
>> +
>> +		pos += size;
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +static int nvme_get_telemetry_log(struct nvme_ctrl *ctrl,
>> +				  struct sg_table *table, size_t bytes)
>> +{
>> +	int n = sg_nents(table->sgl);
>> +	struct scatterlist *sg;
>> +	size_t offset = 0;
>> +	int i;
>> +
A little comment would be nice if you are using sg operations.
>> +	for_each_sg(table->sgl, sg, n, i) {
>> +		struct page *page = sg_page(sg);
>> +		size_t size = min_t(int, bytes - offset, sg->length);
>> +		int ret;
>> +
>> +		ret = nvme_get_telemetry_log_blocks(ctrl,
>> page_address(page),
>> +						    size, offset);
>> +		if (ret)
>> +			return ret;
>> +
>> +		offset += size;
>> +	}
>> +
>> +	return 0;
>> +}
>
> Can we have those two in nvme-core module instead of being in pci module?

Since they are based on the controller they should be moved next to 
nvme_get_log() in the ${KERN_DIR}/drivers/nvme/host/core.c.

>
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme
>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 2/7] devcoredump: fix typo in comment
  2019-05-12 15:54 ` [PATCH v3 2/7] devcoredump: fix typo in comment Akinobu Mita
@ 2019-05-13 15:24   ` Chaitanya Kulkarni
  0 siblings, 0 replies; 22+ messages in thread
From: Chaitanya Kulkarni @ 2019-05-13 15:24 UTC (permalink / raw)
  To: Akinobu Mita, linux-nvme, linux-kernel
  Cc: Jens Axboe, Sagi Grimberg, Kenneth Heitke, Keith Busch,
	Minwoo Im, Johannes Berg, Christoph Hellwig

Looks good.

Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>.


On 05/12/2019 08:55 AM, Akinobu Mita wrote:
> s/dev_coredumpmsg/dev_coredumpsg/
>
> Cc: Johannes Berg <johannes@sipsolutions.net>
> Cc: Keith Busch <keith.busch@intel.com>
> Cc: Jens Axboe <axboe@fb.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Sagi Grimberg <sagi@grimberg.me>
> Cc: Minwoo Im <minwoo.im.dev@gmail.com>
> Cc: Kenneth Heitke <kenneth.heitke@intel.com>
> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
> ---
> * v3
> - No change since v2
>
>   drivers/base/devcoredump.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/base/devcoredump.c b/drivers/base/devcoredump.c
> index 3c960a6..e42d0b5 100644
> --- a/drivers/base/devcoredump.c
> +++ b/drivers/base/devcoredump.c
> @@ -314,7 +314,7 @@ void dev_coredumpm(struct device *dev, struct module *owner,
>   EXPORT_SYMBOL_GPL(dev_coredumpm);
>
>   /**
> - * dev_coredumpmsg - create device coredump that uses scatterlist as data
> + * dev_coredumpsg - create device coredump that uses scatterlist as data
>    * parameter
>    * @dev: the struct device for the crashed device
>    * @table: the dump data
>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 1/7] devcoredump: use memory_read_from_buffer
  2019-05-12 15:54 ` [PATCH v3 1/7] devcoredump: use memory_read_from_buffer Akinobu Mita
@ 2019-05-13 15:28   ` Chaitanya Kulkarni
  0 siblings, 0 replies; 22+ messages in thread
From: Chaitanya Kulkarni @ 2019-05-13 15:28 UTC (permalink / raw)
  To: Akinobu Mita, linux-nvme, linux-kernel
  Cc: Jens Axboe, Sagi Grimberg, Kenneth Heitke, Keith Busch,
	Minwoo Im, Johannes Berg, Christoph Hellwig

Looks good.

Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>.

On 05/12/2019 08:55 AM, Akinobu Mita wrote:
> Use memory_read_from_buffer() to simplify devcd_readv().
>
> Cc: Johannes Berg <johannes@sipsolutions.net>
> Cc: Keith Busch <keith.busch@intel.com>
> Cc: Jens Axboe <axboe@fb.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Sagi Grimberg <sagi@grimberg.me>
> Cc: Minwoo Im <minwoo.im.dev@gmail.com>
> Cc: Kenneth Heitke <kenneth.heitke@intel.com>
> Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
> ---
> * v3
> - No change since v2
>
>   drivers/base/devcoredump.c | 11 +----------
>   1 file changed, 1 insertion(+), 10 deletions(-)
>
> diff --git a/drivers/base/devcoredump.c b/drivers/base/devcoredump.c
> index f1a3353..3c960a6 100644
> --- a/drivers/base/devcoredump.c
> +++ b/drivers/base/devcoredump.c
> @@ -164,16 +164,7 @@ static struct class devcd_class = {
>   static ssize_t devcd_readv(char *buffer, loff_t offset, size_t count,
>   			   void *data, size_t datalen)
>   {
> -	if (offset > datalen)
> -		return -EINVAL;
> -
> -	if (offset + count > datalen)
> -		count = datalen - offset;
> -
> -	if (count)
> -		memcpy(buffer, ((u8 *)data) + offset, count);
> -
> -	return count;
> +	return memory_read_from_buffer(buffer, count, &offset, data, datalen);
>   }
>
>   static void devcd_freev(void *data)
>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 4/7] nvme: add basic facility to get telemetry log page
  2019-05-12 15:54 ` [PATCH v3 4/7] nvme: add basic facility to get telemetry log page Akinobu Mita
@ 2019-05-13 15:34   ` Chaitanya Kulkarni
  2019-05-14 14:04     ` Akinobu Mita
  0 siblings, 1 reply; 22+ messages in thread
From: Chaitanya Kulkarni @ 2019-05-13 15:34 UTC (permalink / raw)
  To: Akinobu Mita, linux-nvme, linux-kernel
  Cc: Jens Axboe, Sagi Grimberg, Kenneth Heitke, Keith Busch,
	Minwoo Im, Johannes Berg, Christoph Hellwig

On 05/12/2019 08:55 AM, Akinobu Mita wrote:
> This adds the required definisions to get telemetry log page.
s/definisions/definitions/
> The telemetry log page structure and identifier are copied from nvme-cli.
>
> We also need a facility to check log page attributes in order to know
> the controller supports the telemetry log pages and log page offset field
> for the Get Log Page command.  The telemetry data area could be larger
> than maximum data transfer size, so we may need to split into multiple
> transfers with incremental page offset.
>
> Cc: Johannes Berg <johannes@sipsolutions.net>
> Cc: Keith Busch <keith.busch@intel.com>
> Cc: Jens Axboe <axboe@fb.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Sagi Grimberg <sagi@grimberg.me>
> Cc: Minwoo Im <minwoo.im.dev@gmail.com>
> Cc: Kenneth Heitke <kenneth.heitke@intel.com>
> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
> ---
> * v3
> - Merge 'add telemetry log page definisions' patch and 'add facility to
>    check log page attributes' patch
> - Copy struct nvme_telemetry_log_page_hdr from the latest nvme-cli
> - Add BUILD_BUG_ON for the size of struct nvme_telemetry_log_page_hdr
>
>   drivers/nvme/host/core.c |  2 ++
>   drivers/nvme/host/nvme.h |  1 +
>   include/linux/nvme.h     | 17 +++++++++++++++++
>   3 files changed, 20 insertions(+)
>
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index a6644a2..0cea2a8 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -2585,6 +2585,7 @@ int nvme_init_identify(struct nvme_ctrl *ctrl)
>   	} else
>   		ctrl->shutdown_timeout = shutdown_timeout;
>
> +	ctrl->lpa = id->lpa;
>   	ctrl->npss = id->npss;
>   	ctrl->apsta = id->apsta;
>   	prev_apst_enabled = ctrl->apst_enabled;
> @@ -3898,6 +3899,7 @@ static inline void _nvme_check_size(void)
>   	BUILD_BUG_ON(sizeof(struct nvme_id_ctrl) != NVME_IDENTIFY_DATA_SIZE);
>   	BUILD_BUG_ON(sizeof(struct nvme_id_ns) != NVME_IDENTIFY_DATA_SIZE);
>   	BUILD_BUG_ON(sizeof(struct nvme_lba_range_type) != 64);
> +	BUILD_BUG_ON(sizeof(struct nvme_telemetry_log_page_hdr) != 512);
>   	BUILD_BUG_ON(sizeof(struct nvme_smart_log) != 512);
>   	BUILD_BUG_ON(sizeof(struct nvme_dbbuf) != 64);
>   	BUILD_BUG_ON(sizeof(struct nvme_directive_cmd) != 64);
> diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
> index 5ee75b5..7f6f1fc 100644
> --- a/drivers/nvme/host/nvme.h
> +++ b/drivers/nvme/host/nvme.h
> @@ -195,6 +195,7 @@ struct nvme_ctrl {
>   	u32 vs;
>   	u32 sgls;
>   	u16 kas;
> +	u8 lpa;
>   	u8 npss;
>   	u8 apsta;
>   	u32 oaes;
> diff --git a/include/linux/nvme.h b/include/linux/nvme.h
> index c40720c..8c0b29d 100644
> --- a/include/linux/nvme.h
> +++ b/include/linux/nvme.h
> @@ -294,6 +294,8 @@ enum {
>   	NVME_CTRL_OACS_DIRECTIVES		= 1 << 5,
>   	NVME_CTRL_OACS_DBBUF_SUPP		= 1 << 8,
>   	NVME_CTRL_LPA_CMD_EFFECTS_LOG		= 1 << 1,
> +	NVME_CTRL_LPA_EXTENDED_DATA		= 1 << 2,
> +	NVME_CTRL_LPA_TELEMETRY_LOG		= 1 << 3,
>   };
>
>   struct nvme_lbaf {
> @@ -396,6 +398,20 @@ enum {
>   	NVME_NIDT_UUID		= 0x03,
>   };
>
> +struct nvme_telemetry_log_page_hdr {
> +	__u8    lpi; /* Log page identifier */
> +	__u8    rsvd[4];
> +	__u8    iee_oui[3];
> +	__le16  dalb1; /* Data area 1 last block */
> +	__le16  dalb2; /* Data area 2 last block */
> +	__le16  dalb3; /* Data area 3 last block */
> +	__u8    rsvd1[368];
> +	__u8    ctrlavail; /* Controller initiated data avail?*/
> +	__u8    ctrldgn; /* Controller initiated telemetry Data Gen # */
> +	__u8    rsnident[128];
> +	__u8    telemetry_dataarea[0];
> +};
> +

nit:- Thanks for adding the comments, can you please align all the above 
comments like :-

+struct nvme_telemetry_log_page_hdr {
+       __u8    lpi;            /* Log page identifier */
+       __u8    rsvd[4];
+       __u8    iee_oui[3];
+       __le16  dalb1;          /* Data area 1 last block */
+       __le16  dalb2;          /* Data area 2 last block */
+       __le16  dalb3;          /* Data area 3 last block */
+       __u8    rsvd1[368];
+       __u8    ctrlavail;      /* Controller initiated data avail?*/
+       __u8    ctrldgn;        /* Controller initiated telemetry Data 
Gen # */
+       __u8    rsnident[128];
+       __u8    telemetry_dataarea[0];
+};
+


>   struct nvme_smart_log {
>   	__u8			critical_warning;
>   	__u8			temperature[2];
> @@ -832,6 +848,7 @@ enum {
>   	NVME_LOG_FW_SLOT	= 0x03,
>   	NVME_LOG_CHANGED_NS	= 0x04,
>   	NVME_LOG_CMD_EFFECTS	= 0x05,
> +	NVME_LOG_TELEMETRY_CTRL	= 0x08,
>   	NVME_LOG_ANA		= 0x0c,
>   	NVME_LOG_DISC		= 0x70,
>   	NVME_LOG_RESERVATION	= 0x80,
>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 4/7] nvme: add basic facility to get telemetry log page
  2019-05-13 15:34   ` Chaitanya Kulkarni
@ 2019-05-14 14:04     ` Akinobu Mita
  0 siblings, 0 replies; 22+ messages in thread
From: Akinobu Mita @ 2019-05-14 14:04 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: linux-nvme, linux-kernel, Jens Axboe, Sagi Grimberg,
	Kenneth Heitke, Keith Busch, Minwoo Im, Johannes Berg,
	Christoph Hellwig

2019年5月14日(火) 0:34 Chaitanya Kulkarni <Chaitanya.Kulkarni@wdc.com>:
>
> On 05/12/2019 08:55 AM, Akinobu Mita wrote:
> > This adds the required definisions to get telemetry log page.
> s/definisions/definitions/

OK.

> > diff --git a/include/linux/nvme.h b/include/linux/nvme.h
> > index c40720c..8c0b29d 100644
> > --- a/include/linux/nvme.h
> > +++ b/include/linux/nvme.h
> > @@ -294,6 +294,8 @@ enum {
> >       NVME_CTRL_OACS_DIRECTIVES               = 1 << 5,
> >       NVME_CTRL_OACS_DBBUF_SUPP               = 1 << 8,
> >       NVME_CTRL_LPA_CMD_EFFECTS_LOG           = 1 << 1,
> > +     NVME_CTRL_LPA_EXTENDED_DATA             = 1 << 2,
> > +     NVME_CTRL_LPA_TELEMETRY_LOG             = 1 << 3,
> >   };
> >
> >   struct nvme_lbaf {
> > @@ -396,6 +398,20 @@ enum {
> >       NVME_NIDT_UUID          = 0x03,
> >   };
> >
> > +struct nvme_telemetry_log_page_hdr {
> > +     __u8    lpi; /* Log page identifier */
> > +     __u8    rsvd[4];
> > +     __u8    iee_oui[3];
> > +     __le16  dalb1; /* Data area 1 last block */
> > +     __le16  dalb2; /* Data area 2 last block */
> > +     __le16  dalb3; /* Data area 3 last block */
> > +     __u8    rsvd1[368];
> > +     __u8    ctrlavail; /* Controller initiated data avail?*/
> > +     __u8    ctrldgn; /* Controller initiated telemetry Data Gen # */
> > +     __u8    rsnident[128];
> > +     __u8    telemetry_dataarea[0];
> > +};
> > +
>
> nit:- Thanks for adding the comments, can you please align all the above
> comments like :-

OK.  I'll send a patch for nvme-cli at first.

> +struct nvme_telemetry_log_page_hdr {
> +       __u8    lpi;            /* Log page identifier */
> +       __u8    rsvd[4];
> +       __u8    iee_oui[3];
> +       __le16  dalb1;          /* Data area 1 last block */
> +       __le16  dalb2;          /* Data area 2 last block */
> +       __le16  dalb3;          /* Data area 3 last block */
> +       __u8    rsvd1[368];
> +       __u8    ctrlavail;      /* Controller initiated data avail?*/
> +       __u8    ctrldgn;        /* Controller initiated telemetry Data
> Gen # */
> +       __u8    rsnident[128];
> +       __u8    telemetry_dataarea[0];
> +};
> +

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 5/7] nvme-pci: add device coredump infrastructure
  2019-05-13 15:23     ` Chaitanya Kulkarni
@ 2019-05-14 14:06       ` Akinobu Mita
  0 siblings, 0 replies; 22+ messages in thread
From: Akinobu Mita @ 2019-05-14 14:06 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: minwoo.im, linux-nvme, linux-kernel, Keith Busch, Sagi Grimberg,
	Kenneth Heitke, Jens Axboe, Minwoo Im, Johannes Berg,
	Christoph Hellwig

2019年5月14日(火) 0:23 Chaitanya Kulkarni <Chaitanya.Kulkarni@wdc.com>:
>
> On 05/13/2019 12:46 AM, Minwoo Im wrote:
> >> +static int nvme_get_telemetry_log_blocks(struct nvme_ctrl *ctrl, void *buf,
> >> +                                     size_t bytes, loff_t offset)
> >> +{
> >> +    loff_t pos = 0;
> >> +    u32 chunk_size;
> >> +
> >> +    if (check_mul_overflow(ctrl->max_hw_sectors, 512u, &chunk_size))
> >> +            chunk_size = UINT_MAX;
> >> +
> >> +    while (pos < bytes) {
> >> +            size_t size = min_t(size_t, bytes - pos, chunk_size);
> >> +            int ret;
> >> +
> >> +            ret = nvme_get_log(ctrl, NVME_NSID_ALL,
> >> NVME_LOG_TELEMETRY_CTRL,
> >> +                               0, buf + pos, size, offset + pos);
> >> +            if (ret)
> >> +                    return ret;
> >> +
> >> +            pos += size;
> >> +    }
> >> +
> >> +    return 0;
> >> +}
> >> +
> >> +static int nvme_get_telemetry_log(struct nvme_ctrl *ctrl,
> >> +                              struct sg_table *table, size_t bytes)
> >> +{
> >> +    int n = sg_nents(table->sgl);
> >> +    struct scatterlist *sg;
> >> +    size_t offset = 0;
> >> +    int i;
> >> +
> A little comment would be nice if you are using sg operations.
> >> +    for_each_sg(table->sgl, sg, n, i) {
> >> +            struct page *page = sg_page(sg);
> >> +            size_t size = min_t(int, bytes - offset, sg->length);
> >> +            int ret;
> >> +
> >> +            ret = nvme_get_telemetry_log_blocks(ctrl,
> >> page_address(page),
> >> +                                                size, offset);
> >> +            if (ret)
> >> +                    return ret;
> >> +
> >> +            offset += size;
> >> +    }
> >> +
> >> +    return 0;
> >> +}
> >
> > Can we have those two in nvme-core module instead of being in pci module?
>
> Since they are based on the controller they should be moved next to
> nvme_get_log() in the ${KERN_DIR}/drivers/nvme/host/core.c.

OK.  But these functions will be changed to use bio_vec instead of sg in
the next version.

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2019-05-14 14:06 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-12 15:54 [PATCH v3 0/7] nvme-pci: support device coredump Akinobu Mita
2019-05-12 15:54 ` [PATCH v3 1/7] devcoredump: use memory_read_from_buffer Akinobu Mita
2019-05-13 15:28   ` Chaitanya Kulkarni
2019-05-12 15:54 ` [PATCH v3 2/7] devcoredump: fix typo in comment Akinobu Mita
2019-05-13 15:24   ` Chaitanya Kulkarni
2019-05-12 15:54 ` [PATCH v3 3/7] devcoredump: allow to create several coredump files in one device Akinobu Mita
2019-05-12 15:54 ` [PATCH v3 4/7] nvme: add basic facility to get telemetry log page Akinobu Mita
2019-05-13 15:34   ` Chaitanya Kulkarni
2019-05-14 14:04     ` Akinobu Mita
2019-05-12 15:54 ` [PATCH v3 5/7] nvme-pci: add device coredump infrastructure Akinobu Mita
2019-05-13 13:50   ` Keith Busch
2019-05-13 15:01     ` Akinobu Mita
2019-05-13 14:02   ` Christoph Hellwig
2019-05-13 15:01     ` Akinobu Mita
2019-05-12 15:54 ` [PATCH v3 6/7] nvme-pci: trigger device coredump on command timeout Akinobu Mita
2019-05-13 13:52   ` Keith Busch
2019-05-12 15:54 ` [PATCH v3 7/7] nvme-pci: enable to trigger device coredump by hand Akinobu Mita
     [not found] ` <CGME20190512155533epcas4p110edff15ebf5b2efae32e43f0f10ab59@epcms2p5>
2019-05-13  7:41   ` [PATCH v3 6/7] nvme-pci: trigger device coredump on command timeout Minwoo Im
2019-05-13 14:59     ` Akinobu Mita
     [not found] ` <CGME20190512155540epcas4p14c15eb86b08dcd281e9a93a4fc190800@epcms2p1>
2019-05-13  7:46   ` [PATCH v3 5/7] nvme-pci: add device coredump infrastructure Minwoo Im
2019-05-13 15:23     ` Chaitanya Kulkarni
2019-05-14 14:06       ` Akinobu Mita

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).