Linux-Block Archive on lore.kernel.org
 help / color / Atom feed
From: Tony Asleson <tasleson@redhat.com>
To: linux-scsi@vger.kernel.org, linux-block@vger.kernel.org
Subject: [RFC PATCH v3 0/8] Add persistent durable identifier to storage log messages
Date: Tue, 23 Jun 2020 14:17:41 -0500
Message-ID: <20200623191749.115200-1-tasleson@redhat.com> (raw)

Today users have no easy way to correlate kernel log messages for storage
devices across reboots, device dynamic add/remove, or when the device is
physically or logically moved from from system to system.  This is due
to the existing log IDs which identify how the device is attached and not
a unique ID of what is attached.  Additionally, even when the attachment
hasn't changed, it's not always obvious which messages belong to the
device as the different areas in the storage stack use different
identifiers, eg. (sda, sata1.00, sd 0:0:0:0).

This change addresses this by adding a unique ID to each log
message.  It couples the existing structured key/value logging capability
and VPD 0x83 device identification.


An example of logs filtered for a specific device utilizing this patch
series.

$ journalctl -b  _KERNEL_DURABLE_NAME="`cat /sys/block/sdb/device/wwid`"
| cut -c 25- | fmt -t
9-08-22 13:21:35 CDT, end at Wed 2020-05-13 15:40:26 CDT. --
l: scsi 1:0:0:0: Attached scsi generic sg1 type 0
l: sd 1:0:0:0: [sdb] 209715200 512-byte logical blocks: (107 GB/100 GiB)
l: sd 1:0:0:0: [sdb] Write Protect is off
l: sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
l: sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't
   support DPO or FUA
l: sd 1:0:0:0: [sdb] Attached SCSI disk
l: sd 1:0:0:0: ata2.00: exception Emask 0x0 SAct 0x800000 SErr 0x800000
   action 0x6 frozen
l: sd 1:0:0:0: ata2.00: failed command: READ FPDMA QUEUED
l: sd 1:0:0:0: ata2.00: cmd 60/20:b8:10:27:00/00:00:00:00:00/40 tag 23
            ncq dma 16384 in res 40/00:00:00:4f:c2/00:00:00:00:00/00
            Emask 0x4 (timeout)
l: sd 1:0:0:0: ata2.00: status: { DRDY }
l: sd 1:0:0:0: ata2.00: configured for UDMA/100
l: sd 1:0:0:0: [sdb] tag#23 FAILED Result: hostbyte=DID_OK
            driverbyte=DRIVER_SENSE cmd_age=30s
l: sd 1:0:0:0: [sdb] tag#23 Sense Key : Illegal Request [current]
l: sd 1:0:0:0: [sdb] tag#23 Add. Sense: Unaligned write command
l: sd 1:0:0:0: [sdb] tag#23 CDB: Read(10) 28 00 00 00 27 10 00 00 20 00
l: block sdb: blk_update_request: I/O error, dev sdb, sector 10000 op
            0x0:(READ) flags 0x80700 phys_seg 4 prio class 0
l: sd 1:0:0:0: ata2.00: exception Emask 0x0 SAct 0x2 SErr 0x2 action
            0x6 frozen
l: sd 1:0:0:0: ata2.00: failed command: READ FPDMA QUEUED
l: sd 1:0:0:0: ata2.00: cmd 60/08:08:10:27:00/00:00:00:00:00/40 tag 1 ncq
            dma 4096 in res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4
            (timeout)
l: sd 1:0:0:0: ata2.00: status: { DRDY }
l: sd 1:0:0:0: ata2.00: configured for UDMA/100
l: sd 1:0:0:0: ata2.00: exception Emask 0x0 SAct 0x800000 SErr 0x800000
            action 0x6 frozen
l: sd 1:0:0:0: ata2.00: failed command: READ FPDMA QUEUED
l: sd 1:0:0:0: ata2.00: cmd 60/08:b8:10:27:00/00:00:00:00:00/40 tag 23 ncq
            dma 4096 in res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4
            (timeout)
l: sd 1:0:0:0: ata2.00: status: { DRDY }
l: sd 1:0:0:0: ata2.00: configured for UDMA/100
l: sd 1:0:0:0: ata2.00: NCQ disabled due to excessive errors
l: sd 1:0:0:0: ata2.00: exception Emask 0x0 SAct 0x4000 SErr 0x4000
            action 0x6 frozen
l: sd 1:0:0:0: ata2.00: failed command: READ FPDMA QUEUED
l: sd 1:0:0:0: ata2.00: cmd 60/08:70:10:27:00/00:00:00:00:00/40 tag 14 ncq
            dma 4096 in res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4
            (timeout)
l: sd 1:0:0:0: ata2.00: status: { DRDY }
l: sd 1:0:0:0: ata2.00: configured for UDMA/100
l: sd 1:0:0:0: ata2.00: device reported invalid CHS sector 0

This change is incomplete.  With the plethora of different logging
techniques utilized in the kernel it will take some coordinated effort
and additional changes.  I tried a few different approaches, to cover
as much as I could without resorting to changing every print statement
in all the storage layers, but maybe there is a better,
more elegant approach?

I believe having this functionality is very useful and important for
system configurations of all sizes.  I mentioned this change briefly in:
https://lore.kernel.org/lkml/30f29fe6-8445-0016-8cdc-3ef99d43fbf5@redhat.com/

Questions
1. Where is the best place to put the durable_name callback function?
2. What is best "KEY" value?
3. Should we re-work the message contents themselves to remove any redundant
   information that is included when using dev_printk? ref. 
   https://lore.kernel.org/linux-scsi/e12aeb9e-fe5d-5b5e-d190-401997cecc34@redhat.com/#t


v2:
- Incorporated changes suggested by James Bottomley
- Removed string function which removed leading/trailing/duplicate adjacent
  spaces from generated id, value matches /sys/block/<device>/device/wwid
- Remove xfs patch, limiting changes to lower block layers
- Moved callback from struct device_type to struct device.  Struct device_type
  is typically static const and with a number of different areas using shared
  implementation of genhd unable to modify for each of the different areas.

v3:
- Increase the size of the buffers for NVMe id generation and
  dev_vprintk_emit

Tony Asleson (8):
  struct device: Add function callback durable_name
  create_syslog_header: Add durable name
  print_req_error: Use dev_printk
  buffer_io_error: Use dev_printk
  ata_dev_printk: Use dev_printk
  scsi: Add durable_name for dev_printk
  nvme: Add durable name for dev_printk
  dev_vprintk_emit: Increase hdr size

 block/blk-core.c           |  5 ++++-
 drivers/ata/libata-core.c  | 10 +++++++---
 drivers/base/core.c        | 31 ++++++++++++++++++++++++++++++-
 drivers/nvme/host/core.c   | 18 ++++++++++++++++++
 drivers/scsi/scsi_lib.c    | 14 ++++++++++++++
 drivers/scsi/scsi_sysfs.c  | 23 +++++++++++++++++++++++
 drivers/scsi/sd.c          |  2 ++
 fs/buffer.c                | 10 ++++++++--
 include/linux/device.h     |  4 ++++
 include/scsi/scsi_device.h |  3 +++
 10 files changed, 113 insertions(+), 7 deletions(-)


base-commit: 3d77e6a8804abcc0504c904bd6e5cdf3a5cf8162
-- 
2.25.4


             reply index

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-23 19:17 Tony Asleson [this message]
2020-06-23 19:17 ` [RFC PATCH v3 1/8] struct device: Add function callback durable_name Tony Asleson
2020-06-23 19:17 ` [RFC PATCH v3 2/8] create_syslog_header: Add durable name Tony Asleson
2020-06-23 19:17 ` [RFC PATCH v3 3/8] print_req_error: Use dev_printk Tony Asleson
2020-06-23 19:17 ` [RFC PATCH v3 4/8] buffer_io_error: " Tony Asleson
2020-06-23 19:17 ` [RFC PATCH v3 5/8] ata_dev_printk: " Tony Asleson
     [not found]   ` <CGME20200624103532eucas1p2c0988207e4dfc2f992d309b75deac3ee@eucas1p2.samsung.com>
2020-06-24 10:35     ` Bartlomiej Zolnierkiewicz
2020-06-24 15:15       ` Tony Asleson
2020-06-26 12:45         ` Bartlomiej Zolnierkiewicz
2020-06-26 13:54           ` Tony Asleson
     [not found]       ` <7ed08b94-755f-baab-0555-b4e454405729@redhat.com>
2020-07-14  8:06         ` Bartlomiej Zolnierkiewicz
2020-07-14  8:17           ` Greg Kroah-Hartman
2020-07-14  8:50             ` Bartlomiej Zolnierkiewicz
2020-07-17 10:06               ` Greg Kroah-Hartman
2020-07-17 10:17                 ` Hannes Reinecke
2020-07-17 10:27                 ` Bartlomiej Zolnierkiewicz
2020-07-17 19:47                   ` Tony Asleson
2020-07-24  8:50                     ` Bartlomiej Zolnierkiewicz
2020-06-23 19:17 ` [RFC PATCH v3 6/8] scsi: Add durable_name for dev_printk Tony Asleson
2020-06-23 19:17 ` [RFC PATCH v3 7/8] nvme: Add durable name " Tony Asleson
2020-06-23 20:04   ` Chaitanya Kulkarni
2020-06-23 20:32     ` Tony Asleson
2020-06-23 19:17 ` [RFC PATCH v3 8/8] dev_vprintk_emit: Increase hdr size Tony Asleson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200623191749.115200-1-tasleson@redhat.com \
    --to=tasleson@redhat.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-Block Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-block/0 linux-block/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-block linux-block/ https://lore.kernel.org/linux-block \
		linux-block@vger.kernel.org
	public-inbox-index linux-block

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-block


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git