linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [v5 00/12] Add persistent durable identifier to storage log messages
@ 2020-09-25 16:19 Tony Asleson
  2020-09-25 16:19 ` [v5 01/12] struct device: Add function callback durable_name Tony Asleson
                   ` (11 more replies)
  0 siblings, 12 replies; 35+ messages in thread
From: Tony Asleson @ 2020-09-25 16:19 UTC (permalink / raw)
  To: linux-scsi, linux-block, linux-ide

Today users have no easy way to correlate kernel log messages for storage
devices across reboots, device dynamic add/remove, or when the device is
physically or logically moved from from system to system.  This is due
to the existing log IDs which identify how the device is attached and not
a unique ID of what is attached.  Additionally, even when the attachment
hasn't changed, it's not always obvious which messages belong to the
device as the different areas in the storage stack use different
identifiers, eg. (sda, sata1.00, sd 0:0:0:0).

This change addresses this by adding a unique ID to each log
message.  It couples the existing structured key/value logging capability
and VPD 0x83 device identification.  The structured key/value data is not
visible in normal viewing and is not seen in the dmesg output or journal
output unless you go looking for it by dumping the output as JSON.

Some examples of logs filtered for a specific device utilizing this patch
series.

$ journalctl -b  _KERNEL_DURABLE_NAME="`cat /sys/block/sdb/device/wwid`" 
| cut -c 25- | fmt -t
l: scsi 1:0:0:0: Attached scsi generic sg1 type 0
l: sd 1:0:0:0: [sdb] 209715200 512-byte logical blocks: (107 GB/100 GiB)
l: sd 1:0:0:0: [sdb] Write Protect is off
l: sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
l: sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't
   support DPO or FUA
l: sd 1:0:0:0: [sdb] Attached SCSI disk
l: ata2.00: exception Emask 0x0 SAct 0x8 SErr 0x8 action 0x6 frozen
l: ata2.00: failed command: READ FPDMA QUEUED
l: ata2.00: cmd 60/01:18:10:27:00/00:00:00:00:00/40 tag 3 ncq dma 512
            in res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
l: ata2.00: status: { DRDY }
l: ata2.00: configured for UDMA/100
l: ata2.00: device reported invalid CHS sector 0
l: ata2.00: exception Emask 0x0 SAct 0x4000 SErr 0x4000 action 0x6 frozen
l: ata2.00: failed command: READ FPDMA QUEUED
l: ata2.00: cmd 60/01:70:10:27:00/00:00:00:00:00/40 tag 14 ncq dma 512
            in res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
l: ata2.00: status: { DRDY }
l: ata2.00: configured for UDMA/100
l: ata2.00: device reported invalid CHS sector 0
l: ata2.00: exception Emask 0x0 SAct 0x80000000 SErr 0x80000000 action
            0x6 frozen
l: ata2.00: failed command: READ FPDMA QUEUED
l: ata2.00: cmd 60/01:f8:10:27:00/00:00:00:00:00/40 tag 31 ncq dma 512
            in res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
l: ata2.00: status: { DRDY }
l: ata2.00: configured for UDMA/100
l: ata2.00: NCQ disabled due to excessive errors
l: ata2.00: exception Emask 0x0 SAct 0x40000 SErr 0x40000 action 0x6
            frozen
l: ata2.00: failed command: READ FPDMA QUEUED
l: ata2.00: cmd 60/01:90:10:27:00/00:00:00:00:00/40 tag 18 ncq dma 512
            in res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
l: ata2.00: status: { DRDY }
l: ata2.00: configured for UDMA/100

$ journalctl -b  _KERNEL_DURABLE_NAME="`cat /sys/block/nvme0n1/wwid`" 
| cut -c 25- | fmt -t
l: blk_update_request: critical medium error, dev nvme0n1, sector 10000
   op 0x0:(READ) flags 0x80700 phys_seg 4 prio class 0
l: blk_update_request: critical medium error, dev nvme0n1, sector 10000
   op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
l: Buffer I/O error on dev nvme0n1, logical block 1250, async page read

$ journalctl -b  _KERNEL_DURABLE_NAME="`cat /sys/block/sdc/device/wwid`"
| cut -c 25- | fmt -t
l: sd 8:0:0:0: Power-on or device reset occurred
l: sd 8:0:0:0: [sdc] 16777216 512-byte logical blocks: (8.59 GB/8.00 GiB)
l: sd 8:0:0:0: Attached scsi generic sg2 type 0
l: sd 8:0:0:0: [sdc] Write Protect is off
l: sd 8:0:0:0: [sdc] Mode Sense: 63 00 00 08
l: sd 8:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't
   support DPO or FUA
l: sd 8:0:0:0: [sdc] Attached SCSI disk
l: sd 8:0:0:0: [sdc] tag#255 FAILED Result: hostbyte=DID_OK
   driverbyte=DRIVER_SENSE cmd_age=0s
l: sd 8:0:0:0: [sdc] tag#255 Sense Key : Medium Error [current]
l: sd 8:0:0:0: [sdc] tag#255 Add. Sense: Unrecovered read error
l: sd 8:0:0:0: [sdc] tag#255 CDB: Read(10) 28 00 00 00 27 10 00 00 01 00
l: blk_update_request: critical medium error, dev sdc, sector 10000 op
   0x0:(READ) flags 0x0 phys_seg 1 prio class 0

There should be no changes to the log message content with this patch series.
I ran release kernel and this patch series and did a compare while forcing the
kernel through the same errors paths to verify.

The first 6 commits in the patch series utilize changes needed for dev_printk
code path.  The last 6 commits in the patch add the needed changes to utilize
durable_name_printk.  The function durable_name_printk is nothing more than
a printk that adds structured key/value durable name to unmodified printk
output.  I structured it this way so only a subset of the patch series could
be theoretically applied if we cannot get agreement on complete patch series.

v2:
- Incorporated changes suggested by James Bottomley
- Removed string function which removed leading/trailing/duplicate adjacent
  spaces from generated id, value matches /sys/block/<device>/device/wwid
- Remove xfs patch, limiting changes to lower block layers
- Moved callback from struct device_type to struct device.  Struct device_type
  is typically static const and with a number of different areas using shared
  implementation of genhd unable to modify for each of the different areas.

v3:
- Increase the size of the buffers for NVMe id generation and
  dev_vprintk_emit
  
v4:
- Back out dev_printk for those locations that weren't using it before, so that
  we don't change the content of the user visible log message by using a
  function durable_name_printk.
- Remove RFC from patch series.

v5:
- Reduced stack usage for nvme wwid
- Make function ata_scsi_durable_name static, found by kernel test robot
- Incorporated suggested changes from Andy Shevchenko and Sergei Shtylyov
  * Remove unneeded line spacing
  * Correct spelling
  * Remove unneeded () in conditional operator
  * Re-worked expressions to follow common kernel patterns, added
    function dev_to_scsi_device
- Re-based for v5.8 branch

Tony Asleson (12):
  struct device: Add function callback durable_name
  create_syslog_header: Add durable name
  dev_vprintk_emit: Increase hdr size
  scsi: Add durable_name for dev_printk
  nvme: Add durable name for dev_printk
  libata: Add ata_scsi_durable_name
  libata: Make ata_scsi_durable_name static
  Add durable_name_printk
  libata: use durable_name_printk
  Add durable_name_printk_ratelimited
  print_req_error: Use durable_name_printk_ratelimited
  buffer_io_error: Use durable_name_printk_ratelimited

 block/blk-core.c           |  5 ++++-
 drivers/ata/libata-core.c  | 17 +++++++-------
 drivers/ata/libata-scsi.c  | 18 ++++++++++++---
 drivers/base/core.c        | 46 +++++++++++++++++++++++++++++++++++++-
 drivers/nvme/host/core.c   | 18 +++++++++++++++
 drivers/scsi/scsi_lib.c    |  9 ++++++++
 drivers/scsi/scsi_sysfs.c  | 35 +++++++++++++++++++++++------
 drivers/scsi/sd.c          |  2 ++
 fs/buffer.c                | 15 +++++++++----
 include/linux/dev_printk.h | 14 ++++++++++++
 include/linux/device.h     |  4 ++++
 include/scsi/scsi_device.h |  3 +++
 12 files changed, 162 insertions(+), 24 deletions(-)


base-commit: bcf876870b95592b52519ed4aafcf9d95999bc9c
-- 
2.26.2


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [v5 01/12] struct device: Add function callback durable_name
  2020-09-25 16:19 [v5 00/12] Add persistent durable identifier to storage log messages Tony Asleson
@ 2020-09-25 16:19 ` Tony Asleson
  2020-09-26  9:08   ` Sergei Shtylyov
  2020-09-29 17:51   ` Christoph Hellwig
  2020-09-25 16:19 ` [v5 02/12] create_syslog_header: Add durable name Tony Asleson
                   ` (10 subsequent siblings)
  11 siblings, 2 replies; 35+ messages in thread
From: Tony Asleson @ 2020-09-25 16:19 UTC (permalink / raw)
  To: linux-scsi, linux-block, linux-ide

Function callback and function to be used to write a persistent
durable name to the supplied character buffer.  This will be used to add
structured key-value data to log messages for hardware related errors
which allows end users to correlate message and specific hardware.

Signed-off-by: Tony Asleson <tasleson@redhat.com>
---
 drivers/base/core.c    | 24 ++++++++++++++++++++++++
 include/linux/device.h |  4 ++++
 2 files changed, 28 insertions(+)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 05d414e9e8a4..88696ade8bfc 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -2489,6 +2489,30 @@ int dev_set_name(struct device *dev, const char *fmt, ...)
 }
 EXPORT_SYMBOL_GPL(dev_set_name);
 
+/**
+ * dev_durable_name - Write "DURABLE_NAME"=<durable name> in buffer
+ * @dev: device
+ * @buffer: character buffer to write results
+ * @len: length of buffer
+ * @return: Number of bytes written to buffer
+ */
+int dev_durable_name(const struct device *dev, char *buffer, size_t len)
+{
+	int tmp, dlen;
+
+	if (dev && dev->durable_name) {
+		tmp = snprintf(buffer, len, "DURABLE_NAME=");
+		if (tmp < len) {
+			dlen = dev->durable_name(dev, buffer + tmp,
+							len - tmp);
+			if (dlen > 0 && ((dlen + tmp) < len))
+				return dlen + tmp;
+		}
+	}
+	return 0;
+}
+EXPORT_SYMBOL_GPL(dev_durable_name);
+
 /**
  * device_to_dev_kobj - select a /sys/dev/ directory for the device
  * @dev: device
diff --git a/include/linux/device.h b/include/linux/device.h
index 5efed864b387..074125999dd8 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -614,6 +614,8 @@ struct device {
 	struct iommu_group	*iommu_group;
 	struct dev_iommu	*iommu;
 
+	int (*durable_name)(const struct device *dev, char *buff, size_t len);
+
 	bool			offline_disabled:1;
 	bool			offline:1;
 	bool			of_node_reused:1;
@@ -655,6 +657,8 @@ static inline const char *dev_name(const struct device *dev)
 extern __printf(2, 3)
 int dev_set_name(struct device *dev, const char *name, ...);
 
+int dev_durable_name(const struct device *d, char *buffer, size_t len);
+
 #ifdef CONFIG_NUMA
 static inline int dev_to_node(struct device *dev)
 {
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [v5 02/12] create_syslog_header: Add durable name
  2020-09-25 16:19 [v5 00/12] Add persistent durable identifier to storage log messages Tony Asleson
  2020-09-25 16:19 ` [v5 01/12] struct device: Add function callback durable_name Tony Asleson
@ 2020-09-25 16:19 ` Tony Asleson
  2020-09-25 16:19 ` [v5 03/12] dev_vprintk_emit: Increase hdr size Tony Asleson
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 35+ messages in thread
From: Tony Asleson @ 2020-09-25 16:19 UTC (permalink / raw)
  To: linux-scsi, linux-block, linux-ide

This gets us a persistent durable name for code that logs messages in the
block layer that have the appropriate callbacks setup for durable name.

Signed-off-by: Tony Asleson <tasleson@redhat.com>
---
 drivers/base/core.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 88696ade8bfc..adef36d4b475 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -3864,6 +3864,7 @@ create_syslog_header(const struct device *dev, char *hdr, size_t hdrlen)
 {
 	const char *subsys;
 	size_t pos = 0;
+	int dlen;
 
 	if (dev->class)
 		subsys = dev->class->name;
@@ -3906,6 +3907,10 @@ create_syslog_header(const struct device *dev, char *hdr, size_t hdrlen)
 				"DEVICE=+%s:%s", subsys, dev_name(dev));
 	}
 
+	dlen = dev_durable_name(dev, hdr + (pos + 1), hdrlen - (pos + 1));
+	if (dlen)
+		pos += dlen + 1;
+
 	if (pos >= hdrlen)
 		goto overflow;
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [v5 03/12] dev_vprintk_emit: Increase hdr size
  2020-09-25 16:19 [v5 00/12] Add persistent durable identifier to storage log messages Tony Asleson
  2020-09-25 16:19 ` [v5 01/12] struct device: Add function callback durable_name Tony Asleson
  2020-09-25 16:19 ` [v5 02/12] create_syslog_header: Add durable name Tony Asleson
@ 2020-09-25 16:19 ` Tony Asleson
  2020-09-25 16:19 ` [v5 04/12] scsi: Add durable_name for dev_printk Tony Asleson
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 35+ messages in thread
From: Tony Asleson @ 2020-09-25 16:19 UTC (permalink / raw)
  To: linux-scsi, linux-block, linux-ide

With the addition of the device persistent id we have the possibility
of adding 154 more bytes to the hdr.  Thus if we assume the previous
size of 128 was sufficient we can simply add the 2 amounts and round
up.

Signed-off-by: Tony Asleson <tasleson@redhat.com>
---
 drivers/base/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index adef36d4b475..72a93b041a2d 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -3924,7 +3924,7 @@ create_syslog_header(const struct device *dev, char *hdr, size_t hdrlen)
 int dev_vprintk_emit(int level, const struct device *dev,
 		     const char *fmt, va_list args)
 {
-	char hdr[128];
+	char hdr[288];
 	size_t hdrlen;
 
 	hdrlen = create_syslog_header(dev, hdr, sizeof(hdr));
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [v5 04/12] scsi: Add durable_name for dev_printk
  2020-09-25 16:19 [v5 00/12] Add persistent durable identifier to storage log messages Tony Asleson
                   ` (2 preceding siblings ...)
  2020-09-25 16:19 ` [v5 03/12] dev_vprintk_emit: Increase hdr size Tony Asleson
@ 2020-09-25 16:19 ` Tony Asleson
  2020-09-25 16:19 ` [v5 05/12] nvme: Add durable name " Tony Asleson
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 35+ messages in thread
From: Tony Asleson @ 2020-09-25 16:19 UTC (permalink / raw)
  To: linux-scsi, linux-block, linux-ide

Add the needed functions to fill out the durable_name function
call back for scsi based storage devices.  This allows calls
into dev_printk for scsi devices to have a persistent id
associated with them.

Signed-off-by: Tony Asleson <tasleson@redhat.com>
---
 drivers/scsi/scsi_lib.c    |  9 +++++++++
 drivers/scsi/scsi_sysfs.c  | 35 ++++++++++++++++++++++++++++-------
 drivers/scsi/sd.c          |  2 ++
 include/scsi/scsi_device.h |  3 +++
 4 files changed, 42 insertions(+), 7 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 06056e9ec333..aa5601733763 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -3151,3 +3151,12 @@ int scsi_vpd_tpg_id(struct scsi_device *sdev, int *rel_id)
 	return group_id;
 }
 EXPORT_SYMBOL(scsi_vpd_tpg_id);
+
+int scsi_durable_name(struct scsi_device *sdev, char *buf, size_t len)
+{
+	int vpd_len = scsi_vpd_lun_id(sdev, buf, len);
+	if (vpd_len > 0 && vpd_len < len)
+		return vpd_len + 1;
+	return 0;
+}
+EXPORT_SYMBOL(scsi_durable_name);
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index 163dbcb741c1..c4840bc80b47 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -504,15 +504,19 @@ static struct class sdev_class = {
 	.dev_release	= scsi_device_cls_release,
 };
 
+static struct scsi_device *dev_to_scsi_device(const struct device *dev)
+{
+	return dev->type == &scsi_dev_type ? to_scsi_device(dev) : NULL;
+}
+
 /* all probing is done in the individual ->probe routines */
 static int scsi_bus_match(struct device *dev, struct device_driver *gendrv)
 {
-	struct scsi_device *sdp;
+	struct scsi_device *sdp = dev_to_scsi_device(dev);
 
-	if (dev->type != &scsi_dev_type)
+	if (!sdp)
 		return 0;
 
-	sdp = to_scsi_device(dev);
 	if (sdp->no_uld_attach)
 		return 0;
 	return (sdp->inq_periph_qual == SCSI_INQ_PQ_CON)? 1: 0;
@@ -520,13 +524,11 @@ static int scsi_bus_match(struct device *dev, struct device_driver *gendrv)
 
 static int scsi_bus_uevent(struct device *dev, struct kobj_uevent_env *env)
 {
-	struct scsi_device *sdev;
+	struct scsi_device *sdev = dev_to_scsi_device(dev);
 
-	if (dev->type != &scsi_dev_type)
+	if (!sdev)
 		return 0;
 
-	sdev = to_scsi_device(dev);
-
 	add_uevent_var(env, "MODALIAS=" SCSI_DEVICE_MODALIAS_FMT, sdev->type);
 	return 0;
 }
@@ -1582,6 +1584,24 @@ static struct device_type scsi_dev_type = {
 	.groups =	scsi_sdev_attr_groups,
 };
 
+int dev_to_scsi_durable_name(const struct device *dev, char *buf, size_t len)
+{
+	/*
+	 * When we go through dev_printk in the scsi layer, dev is embedded
+	 * in a struct scsi_device.  When we go through the block layer,
+	 * dev is embedded in struct genhd, thus we need different paths to
+	 * retrieve the struct scsi_device to call scsi_durable_name.
+	 */
+	struct scsi_device *sdev = dev_to_scsi_device(dev);
+	if (!sdev)
+		sdev = dev_to_scsi_device(dev->parent);
+	if (!sdev)
+		return 0;
+
+	return scsi_durable_name(sdev, buf, len);
+}
+EXPORT_SYMBOL(dev_to_scsi_durable_name);
+
 void scsi_sysfs_device_initialize(struct scsi_device *sdev)
 {
 	unsigned long flags;
@@ -1591,6 +1611,7 @@ void scsi_sysfs_device_initialize(struct scsi_device *sdev)
 	device_initialize(&sdev->sdev_gendev);
 	sdev->sdev_gendev.bus = &scsi_bus_type;
 	sdev->sdev_gendev.type = &scsi_dev_type;
+	sdev->sdev_gendev.durable_name = dev_to_scsi_durable_name;
 	dev_set_name(&sdev->sdev_gendev, "%d:%d:%d:%llu",
 		     sdev->host->host_no, sdev->channel, sdev->id, sdev->lun);
 
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index d90fefffe31b..69ff339fa5ea 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3386,6 +3386,8 @@ static int sd_probe(struct device *dev)
 	gd->private_data = &sdkp->driver;
 	gd->queue = sdkp->device->request_queue;
 
+	disk_to_dev(gd)->durable_name = dev_to_scsi_durable_name;
+
 	/* defaults, until the device tells us otherwise */
 	sdp->sector_size = 512;
 	sdkp->capacity = 0;
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index bc5909033d13..7b6cff11d502 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -464,6 +464,9 @@ extern void sdev_disable_disk_events(struct scsi_device *sdev);
 extern void sdev_enable_disk_events(struct scsi_device *sdev);
 extern int scsi_vpd_lun_id(struct scsi_device *, char *, size_t);
 extern int scsi_vpd_tpg_id(struct scsi_device *, int *);
+extern int dev_to_scsi_durable_name(const struct device *dev, char *buf,
+					size_t len);
+extern int scsi_durable_name(struct scsi_device *sdev, char *buf, size_t len);
 
 #ifdef CONFIG_PM
 extern int scsi_autopm_get_device(struct scsi_device *);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [v5 05/12] nvme: Add durable name for dev_printk
  2020-09-25 16:19 [v5 00/12] Add persistent durable identifier to storage log messages Tony Asleson
                   ` (3 preceding siblings ...)
  2020-09-25 16:19 ` [v5 04/12] scsi: Add durable_name for dev_printk Tony Asleson
@ 2020-09-25 16:19 ` Tony Asleson
  2020-09-25 16:19 ` [v5 06/12] libata: Add ata_scsi_durable_name Tony Asleson
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 35+ messages in thread
From: Tony Asleson @ 2020-09-25 16:19 UTC (permalink / raw)
  To: linux-scsi, linux-block, linux-ide

Changed the comment from // to /* and re-worked buffer
space needed for formatting wwid as requested by
Keith Busch.

ref.
https://lore.kernel.org/linux-block/20200513230455.GA1503@redsun51.ssa.fujisawa.hgst.com/

Signed-off-by: Tony Asleson <tasleson@redhat.com>
---
 drivers/nvme/host/core.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 4ee2330c603e..2e3b808c7815 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2734,6 +2734,22 @@ static bool nvme_validate_cntlid(struct nvme_subsystem *subsys,
 	return true;
 }
 
+static ssize_t wwid_show(struct device *dev, struct device_attribute *attr,
+			char *buf);
+
+static int dev_to_nvme_durable_name(const struct device *dev, char *buf, size_t len)
+{
+	/*
+	 * Max 141 needed for wwid_show, make sure we have the space available
+	 * in our buffer before we format the wwid directly into it.
+	 */
+	if (len >= 141) {
+		ssize_t wwid_len = wwid_show((struct device *)dev, NULL, buf);
+		return wwid_len > 0 ? wwid_len - 1 : 0;  /* remove '\n' */
+	}
+	return 0;
+}
+
 static int nvme_init_subsystem(struct nvme_ctrl *ctrl, struct nvme_id_ctrl *id)
 {
 	struct nvme_subsystem *subsys, *found;
@@ -3663,6 +3679,8 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 	disk->queue = ns->queue;
 	disk->flags = flags;
 	memcpy(disk->disk_name, disk_name, DISK_NAME_LEN);
+	disk_to_dev(disk)->durable_name = dev_to_nvme_durable_name;
+
 	ns->disk = disk;
 
 	if (__nvme_revalidate_disk(disk, id))
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [v5 06/12] libata: Add ata_scsi_durable_name
  2020-09-25 16:19 [v5 00/12] Add persistent durable identifier to storage log messages Tony Asleson
                   ` (4 preceding siblings ...)
  2020-09-25 16:19 ` [v5 05/12] nvme: Add durable name " Tony Asleson
@ 2020-09-25 16:19 ` Tony Asleson
  2020-09-25 16:19 ` [v5 07/12] libata: Make ata_scsi_durable_name static Tony Asleson
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 35+ messages in thread
From: Tony Asleson @ 2020-09-25 16:19 UTC (permalink / raw)
  To: linux-scsi, linux-block, linux-ide

Function used to create the durable name for ata scsi.

Signed-off-by: Tony Asleson <tasleson@redhat.com>
---
 drivers/ata/libata-scsi.c | 18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 46336084b1a9..194dac7dbdca 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -1086,6 +1086,13 @@ int ata_scsi_dev_config(struct scsi_device *sdev, struct ata_device *dev)
 	return 0;
 }
 
+int ata_scsi_durable_name(const struct device *dev, char *buf, size_t len)
+{
+	struct ata_device *ata_dev = container_of(dev, struct ata_device, tdev);
+
+	return scsi_durable_name(ata_dev->sdev, buf, len);
+}
+
 /**
  *	ata_scsi_slave_config - Set SCSI device attributes
  *	@sdev: SCSI device to examine
@@ -1102,14 +1109,19 @@ int ata_scsi_slave_config(struct scsi_device *sdev)
 {
 	struct ata_port *ap = ata_shost_to_port(sdev->host);
 	struct ata_device *dev = __ata_scsi_find_dev(ap, sdev);
-	int rc = 0;
+	int rc;
 
 	ata_scsi_sdev_config(sdev);
 
-	if (dev)
+	if (dev) {
 		rc = ata_scsi_dev_config(sdev, dev);
+		if (rc)
+			return rc;
 
-	return rc;
+		dev->tdev.durable_name = ata_scsi_durable_name;
+	}
+
+	return 0;
 }
 EXPORT_SYMBOL_GPL(ata_scsi_slave_config);
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [v5 07/12] libata: Make ata_scsi_durable_name static
  2020-09-25 16:19 [v5 00/12] Add persistent durable identifier to storage log messages Tony Asleson
                   ` (5 preceding siblings ...)
  2020-09-25 16:19 ` [v5 06/12] libata: Add ata_scsi_durable_name Tony Asleson
@ 2020-09-25 16:19 ` Tony Asleson
  2020-09-26  8:40   ` Sergei Shtylyov
  2020-09-25 16:19 ` [v5 08/12] Add durable_name_printk Tony Asleson
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 35+ messages in thread
From: Tony Asleson @ 2020-09-25 16:19 UTC (permalink / raw)
  To: linux-scsi, linux-block, linux-ide

Signed-off-by: Tony Asleson <tasleson@redhat.com>
Signed-off-by: kernel test robot <lkp@intel.com>
---
 drivers/ata/libata-scsi.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 194dac7dbdca..13a58ed7184c 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -1086,7 +1086,7 @@ int ata_scsi_dev_config(struct scsi_device *sdev, struct ata_device *dev)
 	return 0;
 }
 
-int ata_scsi_durable_name(const struct device *dev, char *buf, size_t len)
+static int ata_scsi_durable_name(const struct device *dev, char *buf, size_t len)
 {
 	struct ata_device *ata_dev = container_of(dev, struct ata_device, tdev);
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [v5 08/12] Add durable_name_printk
  2020-09-25 16:19 [v5 00/12] Add persistent durable identifier to storage log messages Tony Asleson
                   ` (6 preceding siblings ...)
  2020-09-25 16:19 ` [v5 07/12] libata: Make ata_scsi_durable_name static Tony Asleson
@ 2020-09-25 16:19 ` Tony Asleson
  2020-09-26 23:53   ` Randy Dunlap
  2020-09-25 16:19 ` [v5 09/12] libata: use durable_name_printk Tony Asleson
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 35+ messages in thread
From: Tony Asleson @ 2020-09-25 16:19 UTC (permalink / raw)
  To: linux-scsi, linux-block, linux-ide

Ideally block related code would standardize on using dev_printk,
but dev_printk does change the user visible messages which is
questionable.  Adding this function which adds the structured
key/value durable name to the log entry.  It has the
same signature as dev_printk.  In the future, code that
is using this could easily transition to dev_printk when that
becomes workable.

Signed-off-by: Tony Asleson <tasleson@redhat.com>
---
 drivers/base/core.c        | 15 +++++++++++++++
 include/linux/dev_printk.h |  5 +++++
 2 files changed, 20 insertions(+)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 72a93b041a2d..447b0ebc93af 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -3975,6 +3975,21 @@ void dev_printk(const char *level, const struct device *dev,
 }
 EXPORT_SYMBOL(dev_printk);
 
+void durable_name_printk(const char *level, const struct device *dev,
+		const char *fmt, ...)
+{
+	size_t dictlen;
+	va_list args;
+	char dict[288];
+
+	dictlen = dev_durable_name(dev, dict, sizeof(dict));
+
+	va_start(args, fmt);
+	vprintk_emit(0, level[1] - '0', dict, dictlen, fmt, args);
+	va_end(args);
+}
+EXPORT_SYMBOL(durable_name_printk);
+
 #define define_dev_printk_level(func, kern_level)		\
 void func(const struct device *dev, const char *fmt, ...)	\
 {								\
diff --git a/include/linux/dev_printk.h b/include/linux/dev_printk.h
index 3028b644b4fb..4d57b940b692 100644
--- a/include/linux/dev_printk.h
+++ b/include/linux/dev_printk.h
@@ -32,6 +32,11 @@ int dev_printk_emit(int level, const struct device *dev, const char *fmt, ...);
 __printf(3, 4) __cold
 void dev_printk(const char *level, const struct device *dev,
 		const char *fmt, ...);
+
+__printf(3, 4) __cold
+void durable_name_printk(const char *level, const struct device *dev,
+			const char *fmt, ...);
+
 __printf(2, 3) __cold
 void _dev_emerg(const struct device *dev, const char *fmt, ...);
 __printf(2, 3) __cold
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [v5 09/12] libata: use durable_name_printk
  2020-09-25 16:19 [v5 00/12] Add persistent durable identifier to storage log messages Tony Asleson
                   ` (7 preceding siblings ...)
  2020-09-25 16:19 ` [v5 08/12] Add durable_name_printk Tony Asleson
@ 2020-09-25 16:19 ` Tony Asleson
  2020-09-25 16:19 ` [v5 10/12] Add durable_name_printk_ratelimited Tony Asleson
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 35+ messages in thread
From: Tony Asleson @ 2020-09-25 16:19 UTC (permalink / raw)
  To: linux-scsi, linux-block, linux-ide

Utilize durable_name_printk to associate the durable name
with the log message via structured data.  The user visible
portion of the log message is unchanged.

Signed-off-by: Tony Asleson <tasleson@redhat.com>
---
 drivers/ata/libata-core.c | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index b1cd4d97bc2a..11200b861ce8 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -6443,7 +6443,8 @@ void ata_port_printk(const struct ata_port *ap, const char *level,
 	vaf.fmt = fmt;
 	vaf.va = &args;
 
-	printk("%sata%u: %pV", level, ap->print_id, &vaf);
+	durable_name_printk(level, &ap->tdev, "ata%u: %pV",
+		ap->print_id, &vaf);
 
 	va_end(args);
 }
@@ -6461,11 +6462,11 @@ void ata_link_printk(const struct ata_link *link, const char *level,
 	vaf.va = &args;
 
 	if (sata_pmp_attached(link->ap) || link->ap->slave_link)
-		printk("%sata%u.%02u: %pV",
-		       level, link->ap->print_id, link->pmp, &vaf);
+		durable_name_printk(level, &link->tdev, "ata%u.%02u: %pV",
+		       link->ap->print_id, link->pmp, &vaf);
 	else
-		printk("%sata%u: %pV",
-		       level, link->ap->print_id, &vaf);
+		durable_name_printk(level, &link->tdev, "ata%u: %pV",
+		       link->ap->print_id, &vaf);
 
 	va_end(args);
 }
@@ -6482,9 +6483,9 @@ void ata_dev_printk(const struct ata_device *dev, const char *level,
 	vaf.fmt = fmt;
 	vaf.va = &args;
 
-	printk("%sata%u.%02u: %pV",
-	       level, dev->link->ap->print_id, dev->link->pmp + dev->devno,
-	       &vaf);
+	durable_name_printk(level, &dev->tdev, "ata%u.%02u: %pV",
+		dev->link->ap->print_id, dev->link->pmp + dev->devno,
+		&vaf);
 
 	va_end(args);
 }
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [v5 10/12] Add durable_name_printk_ratelimited
  2020-09-25 16:19 [v5 00/12] Add persistent durable identifier to storage log messages Tony Asleson
                   ` (8 preceding siblings ...)
  2020-09-25 16:19 ` [v5 09/12] libata: use durable_name_printk Tony Asleson
@ 2020-09-25 16:19 ` Tony Asleson
  2020-09-25 16:19 ` [v5 11/12] print_req_error: Use durable_name_printk_ratelimited Tony Asleson
  2020-09-25 16:19 ` [v5 12/12] buffer_io_error: " Tony Asleson
  11 siblings, 0 replies; 35+ messages in thread
From: Tony Asleson @ 2020-09-25 16:19 UTC (permalink / raw)
  To: linux-scsi, linux-block, linux-ide

Create a rate limited macro for durable_name_printk so that
we can use this for printk_ratelimited usage in the block
layers and add the durable name key/value to the log
message.

Signed-off-by: Tony Asleson <tasleson@redhat.com>
---
 include/linux/dev_printk.h | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/include/linux/dev_printk.h b/include/linux/dev_printk.h
index 4d57b940b692..9fd675b9ac7c 100644
--- a/include/linux/dev_printk.h
+++ b/include/linux/dev_printk.h
@@ -37,6 +37,15 @@ __printf(3, 4) __cold
 void durable_name_printk(const char *level, const struct device *dev,
 			const char *fmt, ...);
 
+#define durable_name_printk_ratelimited(level, dev, fmt, ...)		\
+do {									\
+	static DEFINE_RATELIMIT_STATE(_rs,				\
+					DEFAULT_RATELIMIT_INTERVAL,	\
+					DEFAULT_RATELIMIT_BURST);	\
+	if (__ratelimit(&_rs))						\
+		durable_name_printk(level, dev, fmt, ##__VA_ARGS__);	\
+} while (0)
+
 __printf(2, 3) __cold
 void _dev_emerg(const struct device *dev, const char *fmt, ...);
 __printf(2, 3) __cold
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [v5 11/12] print_req_error: Use durable_name_printk_ratelimited
  2020-09-25 16:19 [v5 00/12] Add persistent durable identifier to storage log messages Tony Asleson
                   ` (9 preceding siblings ...)
  2020-09-25 16:19 ` [v5 10/12] Add durable_name_printk_ratelimited Tony Asleson
@ 2020-09-25 16:19 ` Tony Asleson
  2020-09-25 16:19 ` [v5 12/12] buffer_io_error: " Tony Asleson
  11 siblings, 0 replies; 35+ messages in thread
From: Tony Asleson @ 2020-09-25 16:19 UTC (permalink / raw)
  To: linux-scsi, linux-block, linux-ide

Replace printk_ratelimited with one that adds the key/value
durable name to log entry.

Signed-off-by: Tony Asleson <tasleson@redhat.com>
---
 block/blk-core.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 03252af8c82c..59e0ff583eb6 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -218,12 +218,15 @@ EXPORT_SYMBOL_GPL(blk_status_to_errno);
 static void print_req_error(struct request *req, blk_status_t status,
 		const char *caller)
 {
+	struct device *dev;
 	int idx = (__force int)status;
 
 	if (WARN_ON_ONCE(idx >= ARRAY_SIZE(blk_errors)))
 		return;
 
-	printk_ratelimited(KERN_ERR
+	dev = req->rq_disk ? disk_to_dev(req->rq_disk) : NULL;
+
+	durable_name_printk_ratelimited(KERN_ERR, dev,
 		"%s: %s error, dev %s, sector %llu op 0x%x:(%s) flags 0x%x "
 		"phys_seg %u prio class %u\n",
 		caller, blk_errors[idx].name,
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [v5 12/12] buffer_io_error: Use durable_name_printk_ratelimited
  2020-09-25 16:19 [v5 00/12] Add persistent durable identifier to storage log messages Tony Asleson
                   ` (10 preceding siblings ...)
  2020-09-25 16:19 ` [v5 11/12] print_req_error: Use durable_name_printk_ratelimited Tony Asleson
@ 2020-09-25 16:19 ` Tony Asleson
  11 siblings, 0 replies; 35+ messages in thread
From: Tony Asleson @ 2020-09-25 16:19 UTC (permalink / raw)
  To: linux-scsi, linux-block, linux-ide

Replace printk_ratelimited with one that adds the key/value
durable name to log entry.

Signed-off-by: Tony Asleson <tasleson@redhat.com>
---
 fs/buffer.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index 64fe82ec65ff..5c4e5b4c82dd 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -125,10 +125,17 @@ EXPORT_SYMBOL(__wait_on_buffer);
 
 static void buffer_io_error(struct buffer_head *bh, char *msg)
 {
-	if (!test_bit(BH_Quiet, &bh->b_state))
-		printk_ratelimited(KERN_ERR
-			"Buffer I/O error on dev %pg, logical block %llu%s\n",
-			bh->b_bdev, (unsigned long long)bh->b_blocknr, msg);
+	struct device *gendev;
+
+	if (test_bit(BH_Quiet, &bh->b_state))
+		return;
+
+	gendev = bh->b_bdev->bd_disk ?
+		disk_to_dev(bh->b_bdev->bd_disk) : NULL;
+
+	durable_name_printk_ratelimited(KERN_ERR, gendev,
+		"Buffer I/O error on dev %pg, logical block %llu%s\n",
+		bh->b_bdev, (unsigned long long)bh->b_blocknr, msg);
 }
 
 /*
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [v5 07/12] libata: Make ata_scsi_durable_name static
  2020-09-25 16:19 ` [v5 07/12] libata: Make ata_scsi_durable_name static Tony Asleson
@ 2020-09-26  8:40   ` Sergei Shtylyov
  2020-09-26 14:17     ` Tony Asleson
  2020-09-28 20:35     ` Tony Asleson
  0 siblings, 2 replies; 35+ messages in thread
From: Sergei Shtylyov @ 2020-09-26  8:40 UTC (permalink / raw)
  To: Tony Asleson, linux-scsi, linux-block, linux-ide

Hello!

On 25.09.2020 19:19, Tony Asleson wrote:

> Signed-off-by: Tony Asleson <tasleson@redhat.com>
> Signed-off-by: kernel test robot <lkp@intel.com>
> ---
>   drivers/ata/libata-scsi.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
> index 194dac7dbdca..13a58ed7184c 100644
> --- a/drivers/ata/libata-scsi.c
> +++ b/drivers/ata/libata-scsi.c
> @@ -1086,7 +1086,7 @@ int ata_scsi_dev_config(struct scsi_device *sdev, struct ata_device *dev)
>   	return 0;
>   }
>   
> -int ata_scsi_durable_name(const struct device *dev, char *buf, size_t len)
> +static int ata_scsi_durable_name(const struct device *dev, char *buf, size_t len)

    Why not do it in patch #6 -- when introducing the function?

[...]

MBR, Sergei

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [v5 01/12] struct device: Add function callback durable_name
  2020-09-25 16:19 ` [v5 01/12] struct device: Add function callback durable_name Tony Asleson
@ 2020-09-26  9:08   ` Sergei Shtylyov
  2020-09-27 14:22     ` Tony Asleson
  2020-09-29 17:51   ` Christoph Hellwig
  1 sibling, 1 reply; 35+ messages in thread
From: Sergei Shtylyov @ 2020-09-26  9:08 UTC (permalink / raw)
  To: Tony Asleson, linux-scsi, linux-block, linux-ide

On 25.09.2020 19:19, Tony Asleson wrote:

> Function callback and function to be used to write a persistent
> durable name to the supplied character buffer.  This will be used to add
> structured key-value data to log messages for hardware related errors
> which allows end users to correlate message and specific hardware.
> 
> Signed-off-by: Tony Asleson <tasleson@redhat.com>
> ---
>   drivers/base/core.c    | 24 ++++++++++++++++++++++++
>   include/linux/device.h |  4 ++++
>   2 files changed, 28 insertions(+)
> 
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index 05d414e9e8a4..88696ade8bfc 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -2489,6 +2489,30 @@ int dev_set_name(struct device *dev, const char *fmt, ...)
>   }
>   EXPORT_SYMBOL_GPL(dev_set_name);
>   
> +/**
> + * dev_durable_name - Write "DURABLE_NAME"=<durable name> in buffer
> + * @dev: device
> + * @buffer: character buffer to write results
> + * @len: length of buffer
> + * @return: Number of bytes written to buffer

    This is not how the kernel-doc commenta describe the function result, IIRC...

> + */
> +int dev_durable_name(const struct device *dev, char *buffer, size_t len)
> +{
> +	int tmp, dlen;
> +
> +	if (dev && dev->durable_name) {
> +		tmp = snprintf(buffer, len, "DURABLE_NAME=");
> +		if (tmp < len) {
> +			dlen = dev->durable_name(dev, buffer + tmp,
> +							len - tmp);
> +			if (dlen > 0 && ((dlen + tmp) < len))
> +				return dlen + tmp;
> +		}
> +	}
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(dev_durable_name);
> +
>   /**
>    * device_to_dev_kobj - select a /sys/dev/ directory for the device
>    * @dev: device
[...]

MBR, Sergei

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [v5 07/12] libata: Make ata_scsi_durable_name static
  2020-09-26  8:40   ` Sergei Shtylyov
@ 2020-09-26 14:17     ` Tony Asleson
  2020-09-26 15:57       ` James Bottomley
  2020-09-28 20:35     ` Tony Asleson
  1 sibling, 1 reply; 35+ messages in thread
From: Tony Asleson @ 2020-09-26 14:17 UTC (permalink / raw)
  To: Sergei Shtylyov, linux-scsi, linux-block, linux-ide

On 9/26/20 3:40 AM, Sergei Shtylyov wrote:
> Hello!
> 
> On 25.09.2020 19:19, Tony Asleson wrote:
> 
>> Signed-off-by: Tony Asleson <tasleson@redhat.com>
>> Signed-off-by: kernel test robot <lkp@intel.com>
>> ---
>>   drivers/ata/libata-scsi.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
>> index 194dac7dbdca..13a58ed7184c 100644
>> --- a/drivers/ata/libata-scsi.c
>> +++ b/drivers/ata/libata-scsi.c
>> @@ -1086,7 +1086,7 @@ int ata_scsi_dev_config(struct scsi_device
>> *sdev, struct ata_device *dev)
>>       return 0;
>>   }
>>   -int ata_scsi_durable_name(const struct device *dev, char *buf,
>> size_t len)
>> +static int ata_scsi_durable_name(const struct device *dev, char *buf,
>> size_t len)
> 
>    Why not do it in patch #6 -- when introducing the function?

This issue was found by the intel kernel test robot in v4 patch series.
I thought it was better to have a separate commit with the correction
that matched it's signed off.  Maybe that's not the correct approach?



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [v5 07/12] libata: Make ata_scsi_durable_name static
  2020-09-26 14:17     ` Tony Asleson
@ 2020-09-26 15:57       ` James Bottomley
  0 siblings, 0 replies; 35+ messages in thread
From: James Bottomley @ 2020-09-26 15:57 UTC (permalink / raw)
  To: tasleson, Sergei Shtylyov, linux-scsi, linux-block, linux-ide

On Sat, 2020-09-26 at 09:17 -0500, Tony Asleson wrote:
> On 9/26/20 3:40 AM, Sergei Shtylyov wrote:
> > Hello!
> > 
> > On 25.09.2020 19:19, Tony Asleson wrote:
> > 
> > > Signed-off-by: Tony Asleson <tasleson@redhat.com>
> > > Signed-off-by: kernel test robot <lkp@intel.com>
> > > ---
> > >   drivers/ata/libata-scsi.c | 2 +-
> > >   1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-
> > > scsi.c
> > > index 194dac7dbdca..13a58ed7184c 100644
> > > --- a/drivers/ata/libata-scsi.c
> > > +++ b/drivers/ata/libata-scsi.c
> > > @@ -1086,7 +1086,7 @@ int ata_scsi_dev_config(struct scsi_device
> > > *sdev, struct ata_device *dev)
> > >       return 0;
> > >   }
> > >   -int ata_scsi_durable_name(const struct device *dev, char *buf,
> > > size_t len)
> > > +static int ata_scsi_durable_name(const struct device *dev, char
> > > *buf,
> > > size_t len)
> > 
> >    Why not do it in patch #6 -- when introducing the function?
> 
> This issue was found by the intel kernel test robot in v4 patch
> series. I thought it was better to have a separate commit with the
> correction that matched it's signed off.  Maybe that's not the
> correct approach?

No ... while a patch is being reviewed the purpose of review is to make
it better by folding in all the comments.  It then gets a changelog put
below the 

---

v5: made X function static

So people can follow how it has evolved.  This is all actually
described in Documentation/submitting-patches.

James



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [v5 08/12] Add durable_name_printk
  2020-09-25 16:19 ` [v5 08/12] Add durable_name_printk Tony Asleson
@ 2020-09-26 23:53   ` Randy Dunlap
  2020-09-28 15:52     ` Tony Asleson
  0 siblings, 1 reply; 35+ messages in thread
From: Randy Dunlap @ 2020-09-26 23:53 UTC (permalink / raw)
  To: Tony Asleson, linux-scsi, linux-block, linux-ide

On 9/25/20 9:19 AM, Tony Asleson wrote:
> Ideally block related code would standardize on using dev_printk,
> but dev_printk does change the user visible messages which is
> questionable.  Adding this function which adds the structured
> key/value durable name to the log entry.  It has the
> same signature as dev_printk.  In the future, code that
> is using this could easily transition to dev_printk when that
> becomes workable.
> 
> Signed-off-by: Tony Asleson <tasleson@redhat.com>
> ---
>  drivers/base/core.c        | 15 +++++++++++++++
>  include/linux/dev_printk.h |  5 +++++
>  2 files changed, 20 insertions(+)

Hi,

I suggest that these 2 new function names should be
	printk_durable_name()
and
	printk_durable_name_ratelimited()

Those names would be closer to the printk* family of
function names.  Of course, you can find exceptions to this,
like dev_printk(), but that is in the dev_*() family of
function names.


> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index 72a93b041a2d..447b0ebc93af 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -3975,6 +3975,21 @@ void dev_printk(const char *level, const struct device *dev,
>  }
>  EXPORT_SYMBOL(dev_printk);
>  
> +void durable_name_printk(const char *level, const struct device *dev,
> +		const char *fmt, ...)
> +{
> +	size_t dictlen;
> +	va_list args;
> +	char dict[288];
> +
> +	dictlen = dev_durable_name(dev, dict, sizeof(dict));
> +
> +	va_start(args, fmt);
> +	vprintk_emit(0, level[1] - '0', dict, dictlen, fmt, args);
> +	va_end(args);
> +}
> +EXPORT_SYMBOL(durable_name_printk);
> +
>  #define define_dev_printk_level(func, kern_level)		\
>  void func(const struct device *dev, const char *fmt, ...)	\
>  {								\
> diff --git a/include/linux/dev_printk.h b/include/linux/dev_printk.h
> index 3028b644b4fb..4d57b940b692 100644
> --- a/include/linux/dev_printk.h
> +++ b/include/linux/dev_printk.h
> @@ -32,6 +32,11 @@ int dev_printk_emit(int level, const struct device *dev, const char *fmt, ...);
>  __printf(3, 4) __cold
>  void dev_printk(const char *level, const struct device *dev,
>  		const char *fmt, ...);
> +
> +__printf(3, 4) __cold
> +void durable_name_printk(const char *level, const struct device *dev,
> +			const char *fmt, ...);
> +
>  __printf(2, 3) __cold
>  void _dev_emerg(const struct device *dev, const char *fmt, ...);
>  __printf(2, 3) __cold
> 

Thanks.
-- 
~Randy


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [v5 01/12] struct device: Add function callback durable_name
  2020-09-26  9:08   ` Sergei Shtylyov
@ 2020-09-27 14:22     ` Tony Asleson
  2020-09-27 16:15       ` Sergei Shtylyov
  0 siblings, 1 reply; 35+ messages in thread
From: Tony Asleson @ 2020-09-27 14:22 UTC (permalink / raw)
  To: Sergei Shtylyov, linux-scsi, linux-block, linux-ide

On 9/26/20 4:08 AM, Sergei Shtylyov wrote:
> On 25.09.2020 19:19, Tony Asleson wrote:
> 
>> Function callback and function to be used to write a persistent
>> durable name to the supplied character buffer.  This will be used to add
>> structured key-value data to log messages for hardware related errors
>> which allows end users to correlate message and specific hardware.
>>
>> Signed-off-by: Tony Asleson <tasleson@redhat.com>
>> ---
>>   drivers/base/core.c    | 24 ++++++++++++++++++++++++
>>   include/linux/device.h |  4 ++++
>>   2 files changed, 28 insertions(+)
>>
>> diff --git a/drivers/base/core.c b/drivers/base/core.c
>> index 05d414e9e8a4..88696ade8bfc 100644
>> --- a/drivers/base/core.c
>> +++ b/drivers/base/core.c
>> @@ -2489,6 +2489,30 @@ int dev_set_name(struct device *dev, const char
>> *fmt, ...)
>>   }
>>   EXPORT_SYMBOL_GPL(dev_set_name);
>>   +/**
>> + * dev_durable_name - Write "DURABLE_NAME"=<durable name> in buffer
>> + * @dev: device
>> + * @buffer: character buffer to write results
>> + * @len: length of buffer
>> + * @return: Number of bytes written to buffer
> 
>    This is not how the kernel-doc commenta describe the function result,
> IIRC...

I did my compile with `make  W=1` and there isn't any warnings/error
with source documentation, but the documentation does indeed outline a
different syntax.  It's interesting how common the @return syntax is in
the existing code base.

I'll re-work the function documentation return.

Thanks


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [v5 01/12] struct device: Add function callback durable_name
  2020-09-27 14:22     ` Tony Asleson
@ 2020-09-27 16:15       ` Sergei Shtylyov
  0 siblings, 0 replies; 35+ messages in thread
From: Sergei Shtylyov @ 2020-09-27 16:15 UTC (permalink / raw)
  To: tasleson, linux-scsi, linux-block, linux-ide

On 27.09.2020 17:22, Tony Asleson wrote:

>>> Function callback and function to be used to write a persistent
>>> durable name to the supplied character buffer.  This will be used to add
>>> structured key-value data to log messages for hardware related errors
>>> which allows end users to correlate message and specific hardware.
>>>
>>> Signed-off-by: Tony Asleson <tasleson@redhat.com>
>>> ---
>>>    drivers/base/core.c    | 24 ++++++++++++++++++++++++
>>>    include/linux/device.h |  4 ++++
>>>    2 files changed, 28 insertions(+)
>>>
>>> diff --git a/drivers/base/core.c b/drivers/base/core.c
>>> index 05d414e9e8a4..88696ade8bfc 100644
>>> --- a/drivers/base/core.c
>>> +++ b/drivers/base/core.c
>>> @@ -2489,6 +2489,30 @@ int dev_set_name(struct device *dev, const char
>>> *fmt, ...)
>>>    }
>>>    EXPORT_SYMBOL_GPL(dev_set_name);
>>>    +/**
>>> + * dev_durable_name - Write "DURABLE_NAME"=<durable name> in buffer
>>> + * @dev: device
>>> + * @buffer: character buffer to write results
>>> + * @len: length of buffer
>>> + * @return: Number of bytes written to buffer
>>
>>     This is not how the kernel-doc commenta describe the function result,
>> IIRC...
> 
> I did my compile with `make  W=1` and there isn't any warnings/error
> with source documentation, but the documentation does indeed outline a

    IIRC, you only get the warnings when you try to build the kernel-docs.

> different syntax.  It's interesting how common the @return syntax is in
> the existing code base.

    FWIW, I'm seeing @return: for the 1st time in my Linux tenure (since 2004).

> I'll re-work the function documentation return.

    OK, thanks. :-)

> Thanks

MBR, Sergei

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [v5 08/12] Add durable_name_printk
  2020-09-26 23:53   ` Randy Dunlap
@ 2020-09-28 15:52     ` Tony Asleson
  2020-09-28 17:32       ` Randy Dunlap
  0 siblings, 1 reply; 35+ messages in thread
From: Tony Asleson @ 2020-09-28 15:52 UTC (permalink / raw)
  To: Randy Dunlap, linux-scsi, linux-block, linux-ide

On 9/26/20 6:53 PM, Randy Dunlap wrote:
> I suggest that these 2 new function names should be
> 	printk_durable_name()
> and
> 	printk_durable_name_ratelimited()
> 
> Those names would be closer to the printk* family of
> function names.  Of course, you can find exceptions to this,
> like dev_printk(), but that is in the dev_*() family of
> function names.

durable_name_printk has the same argument signature as dev_printk with
it's intention that in the future it might be a candidate to get changed
to dev_printk.  The reason I'm not using dev_printk is to avoid changing
the content of the message users see.

With this clarification, do you still suggest the rename or maybe
suggest something different?

dev_id_printk
id_printk
...

I'm also thinking that maybe we should add a new function do everything
dev_printk does, but without prepending the device driver name and
device name to the message.  So we can get the metadata adds without
having the content of the message change.

Thanks


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [v5 08/12] Add durable_name_printk
  2020-09-28 15:52     ` Tony Asleson
@ 2020-09-28 17:32       ` Randy Dunlap
  0 siblings, 0 replies; 35+ messages in thread
From: Randy Dunlap @ 2020-09-28 17:32 UTC (permalink / raw)
  To: tasleson, linux-scsi, linux-block, linux-ide

On 9/28/20 8:52 AM, Tony Asleson wrote:
> On 9/26/20 6:53 PM, Randy Dunlap wrote:
>> I suggest that these 2 new function names should be
>> 	printk_durable_name()
>> and
>> 	printk_durable_name_ratelimited()
>>
>> Those names would be closer to the printk* family of
>> function names.  Of course, you can find exceptions to this,
>> like dev_printk(), but that is in the dev_*() family of
>> function names.
> 
> durable_name_printk has the same argument signature as dev_printk with
> it's intention that in the future it might be a candidate to get changed
> to dev_printk.  The reason I'm not using dev_printk is to avoid changing
> the content of the message users see.
> 
> With this clarification, do you still suggest the rename or maybe
> suggest something different?

Since you seem to bring it up, "durable_name" is a bit long IMO.

But yes, I still prefer printk_durable_name() etc. The other order seems
backwards to me. But that's still just an opinion.


> dev_id_printk
> id_printk
> ...
> 
> I'm also thinking that maybe we should add a new function do everything
> dev_printk does, but without prepending the device driver name and
> device name to the message.  So we can get the metadata adds without
> having the content of the message change.

thanks.
-- 
~Randy
Reported-by: Randy Dunlap <rdunlap@infradead.org>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [v5 07/12] libata: Make ata_scsi_durable_name static
  2020-09-26  8:40   ` Sergei Shtylyov
  2020-09-26 14:17     ` Tony Asleson
@ 2020-09-28 20:35     ` Tony Asleson
  1 sibling, 0 replies; 35+ messages in thread
From: Tony Asleson @ 2020-09-28 20:35 UTC (permalink / raw)
  To: Sergei Shtylyov, linux-scsi, linux-block, linux-ide

On 9/26/20 3:40 AM, Sergei Shtylyov wrote:
> 
>    Why not do it in patch #6 -- when introducing the function?

Re-working patch series, will do as you suggest and as outlined
in the submitting-patches.rst referenced by James.

Thanks


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [v5 01/12] struct device: Add function callback durable_name
  2020-09-25 16:19 ` [v5 01/12] struct device: Add function callback durable_name Tony Asleson
  2020-09-26  9:08   ` Sergei Shtylyov
@ 2020-09-29 17:51   ` Christoph Hellwig
  2020-09-29 18:04     ` Greg Kroah-Hartman
  1 sibling, 1 reply; 35+ messages in thread
From: Christoph Hellwig @ 2020-09-29 17:51 UTC (permalink / raw)
  To: Tony Asleson, Greg Kroah-Hartman; +Cc: linux-scsi, linux-block, linux-ide

Independ of my opinion on the whole scheme that I shared last time,
we really should not bloat struct device with function pointers.

On Fri, Sep 25, 2020 at 11:19:18AM -0500, Tony Asleson wrote:
> Function callback and function to be used to write a persistent
> durable name to the supplied character buffer.  This will be used to add
> structured key-value data to log messages for hardware related errors
> which allows end users to correlate message and specific hardware.
> 
> Signed-off-by: Tony Asleson <tasleson@redhat.com>
> ---
>  drivers/base/core.c    | 24 ++++++++++++++++++++++++
>  include/linux/device.h |  4 ++++
>  2 files changed, 28 insertions(+)
> 
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index 05d414e9e8a4..88696ade8bfc 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -2489,6 +2489,30 @@ int dev_set_name(struct device *dev, const char *fmt, ...)
>  }
>  EXPORT_SYMBOL_GPL(dev_set_name);
>  
> +/**
> + * dev_durable_name - Write "DURABLE_NAME"=<durable name> in buffer
> + * @dev: device
> + * @buffer: character buffer to write results
> + * @len: length of buffer
> + * @return: Number of bytes written to buffer
> + */
> +int dev_durable_name(const struct device *dev, char *buffer, size_t len)
> +{
> +	int tmp, dlen;
> +
> +	if (dev && dev->durable_name) {
> +		tmp = snprintf(buffer, len, "DURABLE_NAME=");
> +		if (tmp < len) {
> +			dlen = dev->durable_name(dev, buffer + tmp,
> +							len - tmp);
> +			if (dlen > 0 && ((dlen + tmp) < len))
> +				return dlen + tmp;
> +		}
> +	}
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(dev_durable_name);
> +
>  /**
>   * device_to_dev_kobj - select a /sys/dev/ directory for the device
>   * @dev: device
> diff --git a/include/linux/device.h b/include/linux/device.h
> index 5efed864b387..074125999dd8 100644
> --- a/include/linux/device.h
> +++ b/include/linux/device.h
> @@ -614,6 +614,8 @@ struct device {
>  	struct iommu_group	*iommu_group;
>  	struct dev_iommu	*iommu;
>  
> +	int (*durable_name)(const struct device *dev, char *buff, size_t len);
> +
>  	bool			offline_disabled:1;
>  	bool			offline:1;
>  	bool			of_node_reused:1;
> @@ -655,6 +657,8 @@ static inline const char *dev_name(const struct device *dev)
>  extern __printf(2, 3)
>  int dev_set_name(struct device *dev, const char *name, ...);
>  
> +int dev_durable_name(const struct device *d, char *buffer, size_t len);
> +
>  #ifdef CONFIG_NUMA
>  static inline int dev_to_node(struct device *dev)
>  {
> -- 
> 2.26.2
> 
---end quoted text---

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [v5 01/12] struct device: Add function callback durable_name
  2020-09-29 17:51   ` Christoph Hellwig
@ 2020-09-29 18:04     ` Greg Kroah-Hartman
  2020-09-29 22:04       ` Tony Asleson
  0 siblings, 1 reply; 35+ messages in thread
From: Greg Kroah-Hartman @ 2020-09-29 18:04 UTC (permalink / raw)
  To: Christoph Hellwig, Tony Asleson, linux-scsi, linux-block, linux-ide

On Tue, Sep 29, 2020 at 06:51:02PM +0100, Christoph Hellwig wrote:
> Independ of my opinion on the whole scheme that I shared last time,
> we really should not bloat struct device with function pointers.
> 
> On Fri, Sep 25, 2020 at 11:19:18AM -0500, Tony Asleson wrote:
> > Function callback and function to be used to write a persistent
> > durable name to the supplied character buffer.  This will be used to add
> > structured key-value data to log messages for hardware related errors
> > which allows end users to correlate message and specific hardware.
> > 
> > Signed-off-by: Tony Asleson <tasleson@redhat.com>
> > ---
> >  drivers/base/core.c    | 24 ++++++++++++++++++++++++
> >  include/linux/device.h |  4 ++++
> >  2 files changed, 28 insertions(+)

I can't find this patch anywhere in my archives, why was I not cc:ed on
it?  It's a v5 and no one thought to ask the driver core
developers/maintainers about it???

{sigh}

And for log messages, what about the dynamic debug developers, why not
include them as well?  Since when is this a storage-only thing?

> > 
> > diff --git a/drivers/base/core.c b/drivers/base/core.c
> > index 05d414e9e8a4..88696ade8bfc 100644
> > --- a/drivers/base/core.c
> > +++ b/drivers/base/core.c
> > @@ -2489,6 +2489,30 @@ int dev_set_name(struct device *dev, const char *fmt, ...)
> >  }
> >  EXPORT_SYMBOL_GPL(dev_set_name);
> >  
> > +/**
> > + * dev_durable_name - Write "DURABLE_NAME"=<durable name> in buffer
> > + * @dev: device
> > + * @buffer: character buffer to write results
> > + * @len: length of buffer
> > + * @return: Number of bytes written to buffer
> > + */
> > +int dev_durable_name(const struct device *dev, char *buffer, size_t len)
> > +{
> > +	int tmp, dlen;
> > +
> > +	if (dev && dev->durable_name) {
> > +		tmp = snprintf(buffer, len, "DURABLE_NAME=");
> > +		if (tmp < len) {
> > +			dlen = dev->durable_name(dev, buffer + tmp,
> > +							len - tmp);
> > +			if (dlen > 0 && ((dlen + tmp) < len))
> > +				return dlen + tmp;
> > +		}
> > +	}
> > +	return 0;
> > +}
> > +EXPORT_SYMBOL_GPL(dev_durable_name);
> > +
> >  /**
> >   * device_to_dev_kobj - select a /sys/dev/ directory for the device
> >   * @dev: device
> > diff --git a/include/linux/device.h b/include/linux/device.h
> > index 5efed864b387..074125999dd8 100644
> > --- a/include/linux/device.h
> > +++ b/include/linux/device.h
> > @@ -614,6 +614,8 @@ struct device {
> >  	struct iommu_group	*iommu_group;
> >  	struct dev_iommu	*iommu;
> >  
> > +	int (*durable_name)(const struct device *dev, char *buff, size_t len);

No, that's not ok at all, why is this even a thing?

Who is setting this?  Why can't the bus do this without anything
"special" needed from the driver core?

We have a mapping of 'struct device' to a unique hardware device at a
specific point in time, why are you trying to create another one?

What is wrong with what we have today?

So this is a HARD NAK on this patch for now.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [v5 01/12] struct device: Add function callback durable_name
  2020-09-29 18:04     ` Greg Kroah-Hartman
@ 2020-09-29 22:04       ` Tony Asleson
  2020-09-30  7:38         ` Greg Kroah-Hartman
  0 siblings, 1 reply; 35+ messages in thread
From: Tony Asleson @ 2020-09-29 22:04 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Christoph Hellwig, linux-scsi, linux-block,
	linux-ide
  Cc: Hannes Reinecke

On 9/29/20 1:04 PM, Greg Kroah-Hartman wrote:
> On Tue, Sep 29, 2020 at 06:51:02PM +0100, Christoph Hellwig wrote:
>> Independ of my opinion on the whole scheme that I shared last time,
>> we really should not bloat struct device with function pointers.

Christoph, thank you for sharing a bit more of your concerns and
bringing Greg into the discussion.  It's more productive.

>>
>> On Fri, Sep 25, 2020 at 11:19:18AM -0500, Tony Asleson wrote:
>>> Function callback and function to be used to write a persistent
>>> durable name to the supplied character buffer.  This will be used to add
>>> structured key-value data to log messages for hardware related errors
>>> which allows end users to correlate message and specific hardware.
>>>
>>> Signed-off-by: Tony Asleson <tasleson@redhat.com>
>>> ---
>>>  drivers/base/core.c    | 24 ++++++++++++++++++++++++
>>>  include/linux/device.h |  4 ++++
>>>  2 files changed, 28 insertions(+)
> 
> I can't find this patch anywhere in my archives, why was I not cc:ed on
> it?  It's a v5 and no one thought to ask the driver core
> developers/maintainers about it???

You were CC'd into v3, ref.

https://lore.kernel.org/linux-ide/20200714081750.GB862637@kroah.com/

I should have continued to CC you on it, sorry about that.


> {sigh}
> 
> And for log messages, what about the dynamic debug developers, why not
> include them as well?  Since when is this a storage-only thing?

Hannes Reinecke has been involved in the discussion some and he's
involved in dynamic debug AFAIK.

If others have a need to identify a specific piece of hardware in a
potential sea of identical hardware that is encountering errors and
logging messages and can optionally be added and removed at run-time,
then yes they should be included too.  There is nothing with this patch
series that is preventing any device from registering a callback which
provides this information when asked.


>>> diff --git a/include/linux/device.h b/include/linux/device.h
>>> index 5efed864b387..074125999dd8 100644
>>> --- a/include/linux/device.h
>>> +++ b/include/linux/device.h
>>> @@ -614,6 +614,8 @@ struct device {
>>>  	struct iommu_group	*iommu_group;
>>>  	struct dev_iommu	*iommu;
>>>  
>>> +	int (*durable_name)(const struct device *dev, char *buff, size_t len);
> 
> No, that's not ok at all, why is this even a thing?
> 
> Who is setting this?  Why can't the bus do this without anything
> "special" needed from the driver core?

I'm setting it in the different storage subsystems.  The intent is that
when we go through a common logging path eg. dev_printk you can ask the
device what it's ID is so that it can be logged as structured data with
the log message.  I was trying to avoid having separate logging
functions all over the place which enforces a consistent meta data to
log messages, but maybe that would be better than adding a function
pointer to struct device?  My first patch tried adding a call back to
struct device_type, but that ran into the issue where struct device_type
is static const in a number of areas.

Basically for any piece of hardware with a serial number, call this
function to get it.  That was the intent anyway.

> We have a mapping of 'struct device' to a unique hardware device at a
> specific point in time, why are you trying to create another one?

I don't think I'm creating anything new.  Can you clarify this a bit
more?  I'm trying to leverage what is already in place.

> What is wrong with what we have today?

I'm trying to figure out a way to positively identify which storage
device an error belongs to over time.  Logging how something is attached
and not what is attached, only solves for right now, this point in time.
 Additionally when the only identifier is in the actual message itself,
it makes user space break every time the message content changes.  Also
what we have today in dev_printk doesn't tag the meta-data consistently
for journald to leverage.

This patch series adds structured data to the log entry for positive
identification.  It doesn't change when the content of the log message
changes.  It's true over time and it's true if a device gets moved to a
different system.

> So this is a HARD NAK on this patch for now.

Thank you for supplying some feedback and asking questions.  I've been
asking for suggestions and would very much like to have a discussion on
how this issue is best solved.  I'm not attached to what I've provided.
I'm just trying to get towards a solution.


We've looked at user space quite a bit and there is an inherit race
condition with trying to fetch the unique hardware id for a message when
it gets emitted from the kernel as udev rules haven't even run (assuming
we even have the meta-data to make the association).  The last thing
people want to do is delay writing the log message to disk until the
device it belongs to can be identified.  Of course this patch series
still has a window from where the device is first discovered by the
kernel and fetches the needed vpd data from the device.  Any kernel
messages logged during that time have no id to associate with it.

Thanks,
Tony



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [v5 01/12] struct device: Add function callback durable_name
  2020-09-29 22:04       ` Tony Asleson
@ 2020-09-30  7:38         ` Greg Kroah-Hartman
  2020-09-30  7:40           ` Greg Kroah-Hartman
  2020-09-30 14:35           ` Tony Asleson
  0 siblings, 2 replies; 35+ messages in thread
From: Greg Kroah-Hartman @ 2020-09-30  7:38 UTC (permalink / raw)
  To: Tony Asleson
  Cc: Christoph Hellwig, linux-scsi, linux-block, linux-ide, Hannes Reinecke

On Tue, Sep 29, 2020 at 05:04:32PM -0500, Tony Asleson wrote:
> On 9/29/20 1:04 PM, Greg Kroah-Hartman wrote:
> > On Tue, Sep 29, 2020 at 06:51:02PM +0100, Christoph Hellwig wrote:
> >> Independ of my opinion on the whole scheme that I shared last time,
> >> we really should not bloat struct device with function pointers.
> 
> Christoph, thank you for sharing a bit more of your concerns and
> bringing Greg into the discussion.  It's more productive.
> 
> >>
> >> On Fri, Sep 25, 2020 at 11:19:18AM -0500, Tony Asleson wrote:
> >>> Function callback and function to be used to write a persistent
> >>> durable name to the supplied character buffer.  This will be used to add
> >>> structured key-value data to log messages for hardware related errors
> >>> which allows end users to correlate message and specific hardware.
> >>>
> >>> Signed-off-by: Tony Asleson <tasleson@redhat.com>
> >>> ---
> >>>  drivers/base/core.c    | 24 ++++++++++++++++++++++++
> >>>  include/linux/device.h |  4 ++++
> >>>  2 files changed, 28 insertions(+)
> > 
> > I can't find this patch anywhere in my archives, why was I not cc:ed on
> > it?  It's a v5 and no one thought to ask the driver core
> > developers/maintainers about it???
> 
> You were CC'd into v3, ref.
> 
> https://lore.kernel.org/linux-ide/20200714081750.GB862637@kroah.com/

Yeah, and I rejected that patch too, no wonder you didn't include me in
further patch reviews, you didn't want me to reject them either :)

> > {sigh}
> > 
> > And for log messages, what about the dynamic debug developers, why not
> > include them as well?  Since when is this a storage-only thing?
> 
> Hannes Reinecke has been involved in the discussion some and he's
> involved in dynamic debug AFAIK.

From the maintainers file:
	DYNAMIC DEBUG
	M:      Jason Baron <jbaron@akamai.com>
	S:      Maintained
	F:      include/linux/dynamic_debug.h
	F:      lib/dynamic_debug.c

Come on, you know this, don't try to avoid the people who have to
maintain the code you are wanting to change, that's not ok.

> If others have a need to identify a specific piece of hardware in a
> potential sea of identical hardware that is encountering errors and
> logging messages and can optionally be added and removed at run-time,
> then yes they should be included too.  There is nothing with this patch
> series that is preventing any device from registering a callback which
> provides this information when asked.

But that's not what the kernel is supposed to be doing, it doesn't care
about tracking things outside of the lifetime from when it can see a
device.  That's what userspace is for.

> >>> diff --git a/include/linux/device.h b/include/linux/device.h
> >>> index 5efed864b387..074125999dd8 100644
> >>> --- a/include/linux/device.h
> >>> +++ b/include/linux/device.h
> >>> @@ -614,6 +614,8 @@ struct device {
> >>>  	struct iommu_group	*iommu_group;
> >>>  	struct dev_iommu	*iommu;
> >>>  
> >>> +	int (*durable_name)(const struct device *dev, char *buff, size_t len);
> > 
> > No, that's not ok at all, why is this even a thing?
> > 
> > Who is setting this?  Why can't the bus do this without anything
> > "special" needed from the driver core?
> 
> I'm setting it in the different storage subsystems.  The intent is that
> when we go through a common logging path eg. dev_printk you can ask the
> device what it's ID is so that it can be logged as structured data with
> the log message.  I was trying to avoid having separate logging
> functions all over the place which enforces a consistent meta data to
> log messages, but maybe that would be better than adding a function
> pointer to struct device?  My first patch tried adding a call back to
> struct device_type, but that ran into the issue where struct device_type
> is static const in a number of areas.
> 
> Basically for any piece of hardware with a serial number, call this
> function to get it.  That was the intent anyway.

That's not ok, again, if you really want something crazy like this, then
modify your logging messages to provide it.  But I would strongly argue
that you do not really need this, as you have this information at
runtime, in userspace, already.

> > We have a mapping of 'struct device' to a unique hardware device at a
> > specific point in time, why are you trying to create another one?
> 
> I don't think I'm creating anything new.  Can you clarify this a bit
> more?  I'm trying to leverage what is already in place.
> 
> > What is wrong with what we have today?
> 
> I'm trying to figure out a way to positively identify which storage
> device an error belongs to over time.

"over time" is not the kernel's responsibility.

This comes up every 5 years or so. The kernel provides you, at runtime,
a mapping between a hardware device and a "logical" device.  It can
provide information to userspace about this mapping, but once that
device goes away, the kernel is free to reuse that logical device again.

If you want to track what logical devices match up to what physical
device, then do it in userspace, by parsing the log files.  All of the
needed information is already there, you don't have to add anything new
to it.

Again, this comes up every few years for some reason, because people
feel it is eaasier to modify the kernel than work with userspace
programs.  Odd...

> > So this is a HARD NAK on this patch for now.
> 
> Thank you for supplying some feedback and asking questions.  I've been
> asking for suggestions and would very much like to have a discussion on
> how this issue is best solved.  I'm not attached to what I've provided.
> I'm just trying to get towards a solution.

Again, solve this in userspace, you have the information there at
runtime, why not use it?

> We've looked at user space quite a bit and there is an inherit race
> condition with trying to fetch the unique hardware id for a message when
> it gets emitted from the kernel as udev rules haven't even run (assuming
> we even have the meta-data to make the association).

But one moment later you do have the information, so you can properly
correlate it, right?

> The last thing
> people want to do is delay writing the log message to disk until the
> device it belongs to can be identified.  Of course this patch series
> still has a window from where the device is first discovered by the
> kernel and fetches the needed vpd data from the device.  Any kernel
> messages logged during that time have no id to associate with it.

No need to delay logging, you are looking at things way later in time,
so there's no issue of race conditions or anything else.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [v5 01/12] struct device: Add function callback durable_name
  2020-09-30  7:38         ` Greg Kroah-Hartman
@ 2020-09-30  7:40           ` Greg Kroah-Hartman
  2020-09-30 14:35           ` Tony Asleson
  1 sibling, 0 replies; 35+ messages in thread
From: Greg Kroah-Hartman @ 2020-09-30  7:40 UTC (permalink / raw)
  To: Tony Asleson
  Cc: Christoph Hellwig, linux-scsi, linux-block, linux-ide, Hannes Reinecke

On Wed, Sep 30, 2020 at 09:38:59AM +0200, Greg Kroah-Hartman wrote:
> > > {sigh}
> > > 
> > > And for log messages, what about the dynamic debug developers, why not
> > > include them as well?  Since when is this a storage-only thing?
> > 
> > Hannes Reinecke has been involved in the discussion some and he's
> > involved in dynamic debug AFAIK.
> 
> From the maintainers file:
> 	DYNAMIC DEBUG
> 	M:      Jason Baron <jbaron@akamai.com>
> 	S:      Maintained
> 	F:      include/linux/dynamic_debug.h
> 	F:      lib/dynamic_debug.c
> 
> Come on, you know this, don't try to avoid the people who have to
> maintain the code you are wanting to change, that's not ok.

Also, why are you not using scripts/get_maintainer.pl on your patches?
It would have told you about this...

greg k-h

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [v5 01/12] struct device: Add function callback durable_name
  2020-09-30  7:38         ` Greg Kroah-Hartman
  2020-09-30  7:40           ` Greg Kroah-Hartman
@ 2020-09-30 14:35           ` Tony Asleson
  2020-10-01 11:48             ` Greg Kroah-Hartman
  1 sibling, 1 reply; 35+ messages in thread
From: Tony Asleson @ 2020-09-30 14:35 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Christoph Hellwig, linux-scsi, linux-block, linux-ide, Hannes Reinecke

On 9/30/20 2:38 AM, Greg Kroah-Hartman wrote:
> On Tue, Sep 29, 2020 at 05:04:32PM -0500, Tony Asleson wrote:
>> I'm trying to figure out a way to positively identify which storage
>> device an error belongs to over time.
> 
> "over time" is not the kernel's responsibility.
> 
> This comes up every 5 years or so. The kernel provides you, at runtime,
> a mapping between a hardware device and a "logical" device.  It can
> provide information to userspace about this mapping, but once that
> device goes away, the kernel is free to reuse that logical device again.
> 
> If you want to track what logical devices match up to what physical
> device, then do it in userspace, by parsing the log files.

I don't understand why people think it's acceptable to ask user space to
parse text that is subject to change.

>> Thank you for supplying some feedback and asking questions.  I've been
>> asking for suggestions and would very much like to have a discussion on
>> how this issue is best solved.  I'm not attached to what I've provided.
>> I'm just trying to get towards a solution.
> 
> Again, solve this in userspace, you have the information there at
> runtime, why not use it?

We usually don't have the needed information if you remove the
expectation that user space should parse the human readable portion of
the error message.

>> We've looked at user space quite a bit and there is an inherit race
>> condition with trying to fetch the unique hardware id for a message when
>> it gets emitted from the kernel as udev rules haven't even run (assuming
>> we even have the meta-data to make the association).
> 
> But one moment later you do have the information, so you can properly
> correlate it, right?

We could have the information if all the storage paths went through
dev_printk.  Here is what we get today when we encounter a read error
which uses printk in the block layer:

{
        "_HOSTNAME" : "pn",
        "_TRANSPORT" : "kernel",
        "__MONOTONIC_TIMESTAMP" : "1806379233",
        "SYSLOG_IDENTIFIER" : "kernel",
        "_SOURCE_MONOTONIC_TIMESTAMP" : "1805611354",
        "SYSLOG_FACILITY" : "0",
        "MESSAGE" : "blk_update_request: critical medium error, dev
nvme0n1, sector 10000 op 0x0:(READ) flags 0x80700 phys_seg 3 prio class 0",
        "PRIORITY" : "3",
        "_MACHINE_ID" : "3f31a0847cea4c95b7a9cec13d07deeb",
        "__REALTIME_TIMESTAMP" : "1601471260802301",
        "_BOOT_ID" : "b03ed610f21d46ab8243a495ba5a0058",
        "__CURSOR" :
"s=a063a22bbb384da0b0412e8f652deabb;i=23c2;b=b03ed610f21d46ab8243a495ba5a0058;m=6bab28e1;t=5b087959e3cfd;x=20528862f8f765c9"
}

Unless you parse the message text you cannot make the association.  If
the same message was changed to dev_printk we would get:


{
        "__REALTIME_TIMESTAMP" : "1589401901093443",
        "__CURSOR" :
"s=caac9703b34a48fd92f7875adae55a2f;i=1c713;b=e2ae14a9def345aa803a13648b95429c;m=7d25b4f;t=5a58d77b85243;x=b034c2d3fb853870",
        "SYSLOG_IDENTIFIER" : "kernel",
        "_KERNEL_DEVICE" : "b259:917504",
        "__MONOTONIC_TIMESTAMP" : "131226447",
        "_UDEV_SYSNAME" : "nvme0n1",
        "PRIORITY" : "3",
        "_KERNEL_SUBSYSTEM" : "block",
        "_SOURCE_MONOTONIC_TIMESTAMP" : "130941917",
        "_TRANSPORT" : "kernel",
        "_MACHINE_ID" : "3f31a0847cea4c95b7a9cec13d07deeb",
        "_HOSTNAME" : "pn",
        "SYSLOG_FACILITY" : "0",
        "_BOOT_ID" : "e2ae14a9def345aa803a13648b95429c",
        "_UDEV_DEVLINK" : [
                "/dev/disk/by-uuid/22fc262a-d621-452a-a951-7761d9fcf0dc",
                "/dev/disk/by-path/pci-0000:00:05.0-nvme-1",

"/dev/disk/by-id/nvme-nvme.8086-4445414442454546-51454d55204e564d65204374726c-00000001",
                "/dev/disk/by-id/nvme-QEMU_NVMe_Ctrl_DEADBEEF"
        ],
        "MESSAGE" : "block nvme0n1: blk_update_request: critical medium
error, dev nvme0n1, sector 10000 op 0x0:(READ) flags 0x0 phys_seg 1 prio
class 0",
        "_UDEV_DEVNODE" : "/dev/nvme0n1"
}


Journald already knows how to utilize the dev_printk meta data.

One idea that I've suggested along the way is creating a dev_printk
function that doesn't change the message text.  We then avoid breaking
people that are parsing.  Is this something that would be acceptable to
folks?  It doesn't solve early boot where udev rules haven't even run,
but it's better.

Thanks,
Tony


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [v5 01/12] struct device: Add function callback durable_name
  2020-09-30 14:35           ` Tony Asleson
@ 2020-10-01 11:48             ` Greg Kroah-Hartman
  2020-10-07 20:10               ` Tony Asleson
  0 siblings, 1 reply; 35+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-01 11:48 UTC (permalink / raw)
  To: Tony Asleson
  Cc: Christoph Hellwig, linux-scsi, linux-block, linux-ide, Hannes Reinecke

On Wed, Sep 30, 2020 at 09:35:52AM -0500, Tony Asleson wrote:
> On 9/30/20 2:38 AM, Greg Kroah-Hartman wrote:
> > On Tue, Sep 29, 2020 at 05:04:32PM -0500, Tony Asleson wrote:
> >> I'm trying to figure out a way to positively identify which storage
> >> device an error belongs to over time.
> > 
> > "over time" is not the kernel's responsibility.
> > 
> > This comes up every 5 years or so. The kernel provides you, at runtime,
> > a mapping between a hardware device and a "logical" device.  It can
> > provide information to userspace about this mapping, but once that
> > device goes away, the kernel is free to reuse that logical device again.
> > 
> > If you want to track what logical devices match up to what physical
> > device, then do it in userspace, by parsing the log files.
> 
> I don't understand why people think it's acceptable to ask user space to
> parse text that is subject to change.

What text is changing?  The format of of the prefix of dev_*() is well
known and has been stable for 15+ years now, right?  What is difficult
in parsing it?

> >> Thank you for supplying some feedback and asking questions.  I've been
> >> asking for suggestions and would very much like to have a discussion on
> >> how this issue is best solved.  I'm not attached to what I've provided.
> >> I'm just trying to get towards a solution.
> > 
> > Again, solve this in userspace, you have the information there at
> > runtime, why not use it?
> 
> We usually don't have the needed information if you remove the
> expectation that user space should parse the human readable portion of
> the error message.

I don't expect that userspace should have to parse any human readable
portion, if they don't want to.  But if you do want it to, it is pretty
trivial to parse what you have today:

	scsi 2:0:0:0: Direct-Access     Generic  STORAGE DEVICE   1531 PQ: 0 ANSI: 6

If you really have a unique identifier, then great, parse it today:

	usb 4-1.3.1: Product: USB3.0 Card Reader
	usb 4-1.3.1: Manufacturer: Generic
	usb 4-1.3.1: SerialNumber: 000000001531

What's keeping that from working now?

In fact, I would argue that it does seem to work, as there are many
commercial tools out there that seem to handle it just fine...

> >> We've looked at user space quite a bit and there is an inherit race
> >> condition with trying to fetch the unique hardware id for a message when
> >> it gets emitted from the kernel as udev rules haven't even run (assuming
> >> we even have the meta-data to make the association).
> > 
> > But one moment later you do have the information, so you can properly
> > correlate it, right?
> 
> We could have the information if all the storage paths went through
> dev_printk.  Here is what we get today when we encounter a read error
> which uses printk in the block layer:
> 
> {
>         "_HOSTNAME" : "pn",
>         "_TRANSPORT" : "kernel",
>         "__MONOTONIC_TIMESTAMP" : "1806379233",
>         "SYSLOG_IDENTIFIER" : "kernel",
>         "_SOURCE_MONOTONIC_TIMESTAMP" : "1805611354",
>         "SYSLOG_FACILITY" : "0",
>         "MESSAGE" : "blk_update_request: critical medium error, dev
> nvme0n1, sector 10000 op 0x0:(READ) flags 0x80700 phys_seg 3 prio class 0",
>         "PRIORITY" : "3",
>         "_MACHINE_ID" : "3f31a0847cea4c95b7a9cec13d07deeb",
>         "__REALTIME_TIMESTAMP" : "1601471260802301",
>         "_BOOT_ID" : "b03ed610f21d46ab8243a495ba5a0058",
>         "__CURSOR" :
> "s=a063a22bbb384da0b0412e8f652deabb;i=23c2;b=b03ed610f21d46ab8243a495ba5a0058;m=6bab28e1;t=5b087959e3cfd;x=20528862f8f765c9"
> }

Ok, messy stuff, don't do that :)

> Unless you parse the message text you cannot make the association.  If
> the same message was changed to dev_printk we would get:
> 
> 
> {
>         "__REALTIME_TIMESTAMP" : "1589401901093443",
>         "__CURSOR" :
> "s=caac9703b34a48fd92f7875adae55a2f;i=1c713;b=e2ae14a9def345aa803a13648b95429c;m=7d25b4f;t=5a58d77b85243;x=b034c2d3fb853870",
>         "SYSLOG_IDENTIFIER" : "kernel",
>         "_KERNEL_DEVICE" : "b259:917504",
>         "__MONOTONIC_TIMESTAMP" : "131226447",
>         "_UDEV_SYSNAME" : "nvme0n1",
>         "PRIORITY" : "3",
>         "_KERNEL_SUBSYSTEM" : "block",
>         "_SOURCE_MONOTONIC_TIMESTAMP" : "130941917",
>         "_TRANSPORT" : "kernel",
>         "_MACHINE_ID" : "3f31a0847cea4c95b7a9cec13d07deeb",
>         "_HOSTNAME" : "pn",
>         "SYSLOG_FACILITY" : "0",
>         "_BOOT_ID" : "e2ae14a9def345aa803a13648b95429c",
>         "_UDEV_DEVLINK" : [
>                 "/dev/disk/by-uuid/22fc262a-d621-452a-a951-7761d9fcf0dc",
>                 "/dev/disk/by-path/pci-0000:00:05.0-nvme-1",
> 
> "/dev/disk/by-id/nvme-nvme.8086-4445414442454546-51454d55204e564d65204374726c-00000001",
>                 "/dev/disk/by-id/nvme-QEMU_NVMe_Ctrl_DEADBEEF"
>         ],
>         "MESSAGE" : "block nvme0n1: blk_update_request: critical medium
> error, dev nvme0n1, sector 10000 op 0x0:(READ) flags 0x0 phys_seg 1 prio
> class 0",
>         "_UDEV_DEVNODE" : "/dev/nvme0n1"
> }

Great, you have a udev sysname, a kernel subsystem and a way to
associate that with a real device, what more are you wanting?

> Journald already knows how to utilize the dev_printk meta data.

And if you talk to the printk developers (which you seem to be keeping
out of the loop here), they are ripping out the meta data facility as
fast as possible.  So don't rely on extending that please.

> One idea that I've suggested along the way is creating a dev_printk
> function that doesn't change the message text.  We then avoid breaking
> people that are parsing.  Is this something that would be acceptable to
> folks?  It doesn't solve early boot where udev rules haven't even run,
> but it's better.

I still fail to understand the root problem here.

Ok, no, I think I understand what you think the problem is, I just don't
see why it is up to the kernel to change what we have today when there
are lots of tools out there working just fine without any kernel changes
needed.

So try explaining the problem as you see it please, so we all know where
to work from.

But again, cutting out the developers of the subsystems you were wanting
to modify might just be making me really grumpy about this whole
thing...

greg k-h

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [v5 01/12] struct device: Add function callback durable_name
  2020-10-01 11:48             ` Greg Kroah-Hartman
@ 2020-10-07 20:10               ` Tony Asleson
  2020-10-08  4:48                 ` Greg Kroah-Hartman
                                   ` (2 more replies)
  0 siblings, 3 replies; 35+ messages in thread
From: Tony Asleson @ 2020-10-07 20:10 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Christoph Hellwig, linux-scsi, linux-block, linux-ide,
	Hannes Reinecke, pmladek, David Lehman, sergey.senozhatsky,
	jbaron, James.Bottomley, linux-kernel, rafael, martin.petersen,
	kbusch, axboe, sagi, akpm, orson.zhai, viro

On 10/1/20 6:48 AM, Greg Kroah-Hartman wrote:
> On Wed, Sep 30, 2020 at 09:35:52AM -0500, Tony Asleson wrote:
>> On 9/30/20 2:38 AM, Greg Kroah-Hartman wrote:
>>> On Tue, Sep 29, 2020 at 05:04:32PM -0500, Tony Asleson wrote:
>>>> I'm trying to figure out a way to positively identify which storage
>>>> device an error belongs to over time.
>>>
>>> "over time" is not the kernel's responsibility.
>>>
>>> This comes up every 5 years or so. The kernel provides you, at runtime,
>>> a mapping between a hardware device and a "logical" device.  It can
>>> provide information to userspace about this mapping, but once that
>>> device goes away, the kernel is free to reuse that logical device again.
>>>
>>> If you want to track what logical devices match up to what physical
>>> device, then do it in userspace, by parsing the log files.
>>
>> I don't understand why people think it's acceptable to ask user space to
>> parse text that is subject to change.
> 
> What text is changing? The format of of the prefix of dev_*() is well
> known and has been stable for 15+ years now, right?  What is difficult
> in parsing it?

Many of the storage layer messages are using printk, not dev_printk.

>>>> Thank you for supplying some feedback and asking questions.  I've been
>>>> asking for suggestions and would very much like to have a discussion on
>>>> how this issue is best solved.  I'm not attached to what I've provided.
>>>> I'm just trying to get towards a solution.
>>>
>>> Again, solve this in userspace, you have the information there at
>>> runtime, why not use it?
>>
>> We usually don't have the needed information if you remove the
>> expectation that user space should parse the human readable portion of
>> the error message.
> 
> I don't expect that userspace should have to parse any human readable
> portion, if they don't want to.  But if you do want it to, it is pretty
> trivial to parse what you have today:
> 
> 	scsi 2:0:0:0: Direct-Access     Generic  STORAGE DEVICE   1531 PQ: 0 ANSI: 6
> 
> If you really have a unique identifier, then great, parse it today:
> 
> 	usb 4-1.3.1: Product: USB3.0 Card Reader
> 	usb 4-1.3.1: Manufacturer: Generic
> 	usb 4-1.3.1: SerialNumber: 000000001531
> 
> What's keeping that from working now?

I believe these examples are using dev_printk.  With dev_printk we don't
need to parse the text, we can use the meta data.

> In fact, I would argue that it does seem to work, as there are many
> commercial tools out there that seem to handle it just fine...

I'm trying to get something that's works for journalctl.

>>>> We've looked at user space quite a bit and there is an inherit race
>>>> condition with trying to fetch the unique hardware id for a message when
>>>> it gets emitted from the kernel as udev rules haven't even run (assuming
>>>> we even have the meta-data to make the association).
>>>
>>> But one moment later you do have the information, so you can properly
>>> correlate it, right?
>>
>> We could have the information if all the storage paths went through
>> dev_printk.  Here is what we get today when we encounter a read error
>> which uses printk in the block layer:
>>
>> {
>>         "_HOSTNAME" : "pn",
>>         "_TRANSPORT" : "kernel",
>>         "__MONOTONIC_TIMESTAMP" : "1806379233",
>>         "SYSLOG_IDENTIFIER" : "kernel",
>>         "_SOURCE_MONOTONIC_TIMESTAMP" : "1805611354",
>>         "SYSLOG_FACILITY" : "0",
>>         "MESSAGE" : "blk_update_request: critical medium error, dev
>> nvme0n1, sector 10000 op 0x0:(READ) flags 0x80700 phys_seg 3 prio class 0",
>>         "PRIORITY" : "3",
>>         "_MACHINE_ID" : "3f31a0847cea4c95b7a9cec13d07deeb",
>>         "__REALTIME_TIMESTAMP" : "1601471260802301",
>>         "_BOOT_ID" : "b03ed610f21d46ab8243a495ba5a0058",
>>         "__CURSOR" :
>> "s=a063a22bbb384da0b0412e8f652deabb;i=23c2;b=b03ed610f21d46ab8243a495ba5a0058;m=6bab28e1;t=5b087959e3cfd;x=20528862f8f765c9"
>> }
> 
> Ok, messy stuff, don't do that :)
> 
>> Unless you parse the message text you cannot make the association.  If
>> the same message was changed to dev_printk we would get:
>>
>>
>> {
>>         "__REALTIME_TIMESTAMP" : "1589401901093443",
>>         "__CURSOR" :
>> "s=caac9703b34a48fd92f7875adae55a2f;i=1c713;b=e2ae14a9def345aa803a13648b95429c;m=7d25b4f;t=5a58d77b85243;x=b034c2d3fb853870",
>>         "SYSLOG_IDENTIFIER" : "kernel",
>>         "_KERNEL_DEVICE" : "b259:917504",
>>         "__MONOTONIC_TIMESTAMP" : "131226447",
>>         "_UDEV_SYSNAME" : "nvme0n1",
>>         "PRIORITY" : "3",
>>         "_KERNEL_SUBSYSTEM" : "block",
>>         "_SOURCE_MONOTONIC_TIMESTAMP" : "130941917",
>>         "_TRANSPORT" : "kernel",
>>         "_MACHINE_ID" : "3f31a0847cea4c95b7a9cec13d07deeb",
>>         "_HOSTNAME" : "pn",
>>         "SYSLOG_FACILITY" : "0",
>>         "_BOOT_ID" : "e2ae14a9def345aa803a13648b95429c",
>>         "_UDEV_DEVLINK" : [
>>                 "/dev/disk/by-uuid/22fc262a-d621-452a-a951-7761d9fcf0dc",
>>                 "/dev/disk/by-path/pci-0000:00:05.0-nvme-1",
>>
>> "/dev/disk/by-id/nvme-nvme.8086-4445414442454546-51454d55204e564d65204374726c-00000001",
>>                 "/dev/disk/by-id/nvme-QEMU_NVMe_Ctrl_DEADBEEF"
>>         ],
>>         "MESSAGE" : "block nvme0n1: blk_update_request: critical medium
>> error, dev nvme0n1, sector 10000 op 0x0:(READ) flags 0x0 phys_seg 1 prio
>> class 0",
>>         "_UDEV_DEVNODE" : "/dev/nvme0n1"
>> }
> 
> Great, you have a udev sysname, a kernel subsystem and a way to
> associate that with a real device, what more are you wanting?

Did you miss in my example where it's currently a printk?  I showed what
it would look like if it was a dev_printk.

Journald is using _KERNEL_DEVICE to add the _UDEV_DEVLINK information to
the journal entry, it's not parsing the prefix of the message.

The above json is outputted from journalctl when you specify "-o
json-pretty".

>> Journald already knows how to utilize the dev_printk meta data.
> 
> And if you talk to the printk developers (which you seem to be keeping
> out of the loop here), they are ripping out the meta data facility as
> fast as possible.  So don't rely on extending that please.

Again, I'm not trying to keep anyone out of the loop.  Last I knew the
meta data capability wasn't being removed, maybe this has changed?

Ref.

https://lore.kernel.org/lkml/20191007120134.ciywr3wale4gxa6v@pathway.suse.cz/


>> One idea that I've suggested along the way is creating a dev_printk
>> function that doesn't change the message text.  We then avoid breaking
>> people that are parsing.  Is this something that would be acceptable to
>> folks?  It doesn't solve early boot where udev rules haven't even run,
>> but it's better.
> 
> I still fail to understand the root problem here.

IMHO one of the root problems is that many storage messages are still
using printk.  Changing messages to dev_printk has been met with resistance.


> Ok, no, I think I understand what you think the problem is, I just don't
> see why it is up to the kernel to change what we have today when there
> are lots of tools out there working just fine without any kernel changes
> needed.
> 
> So try explaining the problem as you see it please, so we all know where
> to work from.

To me the problem today is the kernel logs information to identify how a
storage device is attached, not what is attached.  I think you agree
with this statement.  The log information is not helpful without the
information to correlate to the actual device.  I think you also agree
with this too as you have mentioned it's user space's responsibility to
collect this so that the correlation can be done.

If the following are *both* true, we have a usable message that has the
correlated data with it in the journal.

1. The storage related kernel message goes through dev_printk
2. At the time of the message being emitted the device symlinks are present.

When those two things are both true, journalctl can do the following
(today):

$ journalctl /dev/disk/by-id/wwn-0x5002538844584d30


However, it usually can't because the above two things are rarely both
true at the same time for a given message for journald when it logs to
the journal.

You keep saying this is a user space issue, but I believe we still need
a bit of help from the kernel at the very least by migrating to
dev_printk or something similar that adds the same meta data without
changing the message text.

Yes, my patch series went one step further and added the device ID as
structured data to the log message, but I was also trying to minimize
the race condition between the kernel emitting a message and journald
not having the information to associate it to the hardware device.

If people have other suggestions please let them be known.

> But again, cutting out the developers of the subsystems you were wanting
> to modify might just be making me really grumpy about this whole
> thing...

Again, I'm sorry I didn't reach out to the correct people.  Hopefully
I've CC'd everyone that is appropriate for this discussion.


Thanks,
Tony


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [v5 01/12] struct device: Add function callback durable_name
  2020-10-07 20:10               ` Tony Asleson
@ 2020-10-08  4:48                 ` Greg Kroah-Hartman
  2020-10-08 20:49                   ` Martin K. Petersen
  2020-10-08  5:54                 ` Hannes Reinecke
  2020-10-08  6:22                 ` Finn Thain
  2 siblings, 1 reply; 35+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-08  4:48 UTC (permalink / raw)
  To: Tony Asleson
  Cc: Christoph Hellwig, linux-scsi, linux-block, linux-ide,
	Hannes Reinecke, pmladek, David Lehman, sergey.senozhatsky,
	jbaron, James.Bottomley, linux-kernel, rafael, martin.petersen,
	kbusch, axboe, sagi, akpm, orson.zhai, viro

On Wed, Oct 07, 2020 at 03:10:17PM -0500, Tony Asleson wrote:
> On 10/1/20 6:48 AM, Greg Kroah-Hartman wrote:
> > On Wed, Sep 30, 2020 at 09:35:52AM -0500, Tony Asleson wrote:
> >> On 9/30/20 2:38 AM, Greg Kroah-Hartman wrote:
> >>> On Tue, Sep 29, 2020 at 05:04:32PM -0500, Tony Asleson wrote:
> >>>> I'm trying to figure out a way to positively identify which storage
> >>>> device an error belongs to over time.
> >>>
> >>> "over time" is not the kernel's responsibility.
> >>>
> >>> This comes up every 5 years or so. The kernel provides you, at runtime,
> >>> a mapping between a hardware device and a "logical" device.  It can
> >>> provide information to userspace about this mapping, but once that
> >>> device goes away, the kernel is free to reuse that logical device again.
> >>>
> >>> If you want to track what logical devices match up to what physical
> >>> device, then do it in userspace, by parsing the log files.
> >>
> >> I don't understand why people think it's acceptable to ask user space to
> >> parse text that is subject to change.
> > 
> > What text is changing? The format of of the prefix of dev_*() is well
> > known and has been stable for 15+ years now, right?  What is difficult
> > in parsing it?
> 
> Many of the storage layer messages are using printk, not dev_printk.

Ok, then stop right there.  Fix that up.  Don't try to route around the
standard way of displaying log messages by creating a totally different
way of doing things.

Just use the dev_*() calls, and all will be fine.  Kernel log messages
are not "ABI" in that they have to be preserved in any specific way, so
adding a prefix to them as dev_*() does, will be fine.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [v5 01/12] struct device: Add function callback durable_name
  2020-10-07 20:10               ` Tony Asleson
  2020-10-08  4:48                 ` Greg Kroah-Hartman
@ 2020-10-08  5:54                 ` Hannes Reinecke
  2020-10-08  6:22                 ` Finn Thain
  2 siblings, 0 replies; 35+ messages in thread
From: Hannes Reinecke @ 2020-10-08  5:54 UTC (permalink / raw)
  To: tasleson, Greg Kroah-Hartman
  Cc: Christoph Hellwig, linux-scsi, linux-block, linux-ide, pmladek,
	David Lehman, sergey.senozhatsky, jbaron, James.Bottomley,
	linux-kernel, rafael, martin.petersen, kbusch, axboe, sagi, akpm,
	orson.zhai, viro

On 10/7/20 10:10 PM, Tony Asleson wrote:
> On 10/1/20 6:48 AM, Greg Kroah-Hartman wrote:
>> On Wed, Sep 30, 2020 at 09:35:52AM -0500, Tony Asleson wrote:
>>> On 9/30/20 2:38 AM, Greg Kroah-Hartman wrote:
>>>> On Tue, Sep 29, 2020 at 05:04:32PM -0500, Tony Asleson wrote:
>>>>> I'm trying to figure out a way to positively identify which storage
>>>>> device an error belongs to over time.
>>>>
>>>> "over time" is not the kernel's responsibility.
>>>>
>>>> This comes up every 5 years or so. The kernel provides you, at runtime,
>>>> a mapping between a hardware device and a "logical" device.  It can
>>>> provide information to userspace about this mapping, but once that
>>>> device goes away, the kernel is free to reuse that logical device again.
>>>>
>>>> If you want to track what logical devices match up to what physical
>>>> device, then do it in userspace, by parsing the log files.
>>>
>>> I don't understand why people think it's acceptable to ask user space to
>>> parse text that is subject to change.
>>
>> What text is changing? The format of of the prefix of dev_*() is well
>> known and has been stable for 15+ years now, right?  What is difficult
>> in parsing it?
> 
> Many of the storage layer messages are using printk, not dev_printk.
> 
So that would be the immediate angle of attack ...

>>>>> Thank you for supplying some feedback and asking questions.  I've been
>>>>> asking for suggestions and would very much like to have a discussion on
>>>>> how this issue is best solved.  I'm not attached to what I've provided.
>>>>> I'm just trying to get towards a solution.
>>>>
>>>> Again, solve this in userspace, you have the information there at
>>>> runtime, why not use it?
>>>
>>> We usually don't have the needed information if you remove the
>>> expectation that user space should parse the human readable portion of
>>> the error message.
>>
>> I don't expect that userspace should have to parse any human readable
>> portion, if they don't want to.  But if you do want it to, it is pretty
>> trivial to parse what you have today:
>>
>> 	scsi 2:0:0:0: Direct-Access     Generic  STORAGE DEVICE   1531 PQ: 0 ANSI: 6
>>
>> If you really have a unique identifier, then great, parse it today:
>>
>> 	usb 4-1.3.1: Product: USB3.0 Card Reader
>> 	usb 4-1.3.1: Manufacturer: Generic
>> 	usb 4-1.3.1: SerialNumber: 000000001531
>>
>> What's keeping that from working now?
> 
> I believe these examples are using dev_printk.  With dev_printk we don't
> need to parse the text, we can use the meta data.
> So it looks as most of your usecase would be solved by moving to 
dev_printk().
Why not work on that instead?
I do presume this will have immediate benefits for everybody, and will 
have approval from everyone.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [v5 01/12] struct device: Add function callback durable_name
  2020-10-07 20:10               ` Tony Asleson
  2020-10-08  4:48                 ` Greg Kroah-Hartman
  2020-10-08  5:54                 ` Hannes Reinecke
@ 2020-10-08  6:22                 ` Finn Thain
  2 siblings, 0 replies; 35+ messages in thread
From: Finn Thain @ 2020-10-08  6:22 UTC (permalink / raw)
  To: Tony Asleson
  Cc: Greg Kroah-Hartman, Christoph Hellwig, linux-scsi, linux-block,
	linux-ide, Hannes Reinecke, pmladek, David Lehman,
	sergey.senozhatsky, jbaron, James.Bottomley, linux-kernel,
	rafael, martin.petersen, kbusch, axboe, sagi, akpm, orson.zhai,
	viro

On Wed, 7 Oct 2020, Tony Asleson wrote:

> The log information is not helpful without the information to correlate 
> to the actual device.

Log messages that associate one entity with another can be generated 
whenever such an association comes into existence, which is probably when 
devices get probed.

E.g. a host:channel:target:lun identifier gets associated with a block 
device name by the dev_printk() calls in sd_probe():

[    3.600000] sd 0:0:0:0: [sda] Attached SCSI disk

BTW, if you think of {"0:0:0:0","sda"} as a row in some normalized table 
and squint a bit, this problem is not unlike the replication of database 
tables over a message queue.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [v5 01/12] struct device: Add function callback durable_name
  2020-10-08  4:48                 ` Greg Kroah-Hartman
@ 2020-10-08 20:49                   ` Martin K. Petersen
  0 siblings, 0 replies; 35+ messages in thread
From: Martin K. Petersen @ 2020-10-08 20:49 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Tony Asleson, Christoph Hellwig, linux-scsi, linux-block,
	linux-ide, Hannes Reinecke, pmladek, David Lehman,
	sergey.senozhatsky, jbaron, James.Bottomley, linux-kernel,
	rafael, martin.petersen, kbusch, axboe, sagi, akpm, orson.zhai,
	viro


Greg,

>> > What text is changing? The format of of the prefix of dev_*() is well
>> > known and has been stable for 15+ years now, right?  What is difficult
>> > in parsing it?
>> 
>> Many of the storage layer messages are using printk, not dev_printk.
>
> Ok, then stop right there.  Fix that up.  Don't try to route around the
> standard way of displaying log messages by creating a totally different
> way of doing things.

Couldn't agree more!

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2020-10-08 20:50 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-25 16:19 [v5 00/12] Add persistent durable identifier to storage log messages Tony Asleson
2020-09-25 16:19 ` [v5 01/12] struct device: Add function callback durable_name Tony Asleson
2020-09-26  9:08   ` Sergei Shtylyov
2020-09-27 14:22     ` Tony Asleson
2020-09-27 16:15       ` Sergei Shtylyov
2020-09-29 17:51   ` Christoph Hellwig
2020-09-29 18:04     ` Greg Kroah-Hartman
2020-09-29 22:04       ` Tony Asleson
2020-09-30  7:38         ` Greg Kroah-Hartman
2020-09-30  7:40           ` Greg Kroah-Hartman
2020-09-30 14:35           ` Tony Asleson
2020-10-01 11:48             ` Greg Kroah-Hartman
2020-10-07 20:10               ` Tony Asleson
2020-10-08  4:48                 ` Greg Kroah-Hartman
2020-10-08 20:49                   ` Martin K. Petersen
2020-10-08  5:54                 ` Hannes Reinecke
2020-10-08  6:22                 ` Finn Thain
2020-09-25 16:19 ` [v5 02/12] create_syslog_header: Add durable name Tony Asleson
2020-09-25 16:19 ` [v5 03/12] dev_vprintk_emit: Increase hdr size Tony Asleson
2020-09-25 16:19 ` [v5 04/12] scsi: Add durable_name for dev_printk Tony Asleson
2020-09-25 16:19 ` [v5 05/12] nvme: Add durable name " Tony Asleson
2020-09-25 16:19 ` [v5 06/12] libata: Add ata_scsi_durable_name Tony Asleson
2020-09-25 16:19 ` [v5 07/12] libata: Make ata_scsi_durable_name static Tony Asleson
2020-09-26  8:40   ` Sergei Shtylyov
2020-09-26 14:17     ` Tony Asleson
2020-09-26 15:57       ` James Bottomley
2020-09-28 20:35     ` Tony Asleson
2020-09-25 16:19 ` [v5 08/12] Add durable_name_printk Tony Asleson
2020-09-26 23:53   ` Randy Dunlap
2020-09-28 15:52     ` Tony Asleson
2020-09-28 17:32       ` Randy Dunlap
2020-09-25 16:19 ` [v5 09/12] libata: use durable_name_printk Tony Asleson
2020-09-25 16:19 ` [v5 10/12] Add durable_name_printk_ratelimited Tony Asleson
2020-09-25 16:19 ` [v5 11/12] print_req_error: Use durable_name_printk_ratelimited Tony Asleson
2020-09-25 16:19 ` [v5 12/12] buffer_io_error: " Tony Asleson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).