Linux-ide Archive on lore.kernel.org
 help / color / Atom feed
From: Tony Asleson <tasleson@redhat.com>
To: dgilbert@interlog.com, Hannes Reinecke <hare@suse.de>,
	Christoph Hellwig <hch@infradead.org>
Cc: linux-block@vger.kernel.org, linux-ide@vger.kernel.org,
	linux-scsi@vger.kernel.org, b.zolnierkie@samsung.com,
	axboe@kernel.dk
Subject: Re: [v4 00/11] Add persistent durable identifier to storage log messages
Date: Mon, 27 Jul 2020 15:27:39 -0500
Message-ID: <6973893a-eda8-6128-b484-7c89c1dc5070@redhat.com> (raw)
In-Reply-To: <90798655-0ee1-330f-cae4-937c4981563a@interlog.com>

On 7/27/20 2:17 PM, Douglas Gilbert wrote:
> On 2020-07-27 1:42 p.m., Tony Asleson wrote:
>> On 7/27/20 11:46 AM, Hannes Reinecke wrote:
>>> On 7/27/20 5:45 PM, Tony Asleson wrote:
>>>> On 7/26/20 10:10 AM, Christoph Hellwig wrote:
>>>>> FYI, I think these identifiers are absolutely horrible and have no
>>>>> business in dmesg:
>>>>
>>>> The identifiers are structured data, they're not visible unless you go
>>>> looking for them.
>>>>
>>>> I'm open to other suggestions on how we can positively identify storage
>>>> devices over time, across reboots, replacement, and dynamic
>>>> reconfiguration.
>>>>
>>>> My home system has 4 disks, 2 are identical except for serial number.
>>>> Even with this simple configuration, it's not trivial to identify which
>>>> message goes with which disk across reboots.
>>>>
>>> Well; the more important bits would be to identify the physical location
>>> where these disks reside.
>>> If one goes bad it doesn't really help you if have a persistent
>>> identification in the OS; what you really need is a physical indicator
>>> or a physical location allowing you to identify which disk to pull.
>>
>> In my use case I have no slot information.  I have no SCSI enclosure
>> services to toggle identification LEDs or fault LEDs for the drive sled.
>>   For some users the device might be a virtual one in a storage server,
>> vpd helps.
>>
>> In my case the in kernel vpd (WWN) data can be used to correlate with
>> the sticker on the disk as the disks have the WWN printed on them.  I
>> would think this is true for most disks/storage devices, but obviously I
>> can't make that statement with 100% certainty as I have a small sample
>> size.
>>
>>> Which isn't addressed at all with this patchset (nor should it; the
>>> physical location it typically found via other means).
>>>
>>> And for the other use-cases: We do have persistent device links, do we
>>> not?
>>
>> How does /dev/disk/by-* help when you are looking at the journal from 1
>> or more reboots ago and the only thing you have in your journal is
>> something like:
>>
>> blk_update_request: critical medium error, dev sde, sector 43578 op
>> 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
>>
>> The links are only valid for right now.
> 
> Does:
>    lsscsi -U
> or
>    lsscsi -UU
> 
> solve your problem, or come close?
> 
> Example:
> # lsscsi -UU
> [1:0:0:0]    disk    naa.5000cca02b38d0b8  /dev/sda
> [1:0:1:0]    disk    naa.5000c5003011cb2b  /dev/sdb
> [1:0:2:0]    enclosu naa.5001b4d516ecc03f  -
> [N:0:1:1]    disk    eui.e8238fa6bf530001001b448b46bd5525    /dev/nvme0n1
> 
> The first two (SAS SSDs) NAAs are printed on the disk labels. I don't
> think either that enclosure or the M2 NVMe SSD have their numbers
> visible (i.e. the last two lines of output).
> 
> If it is what you want, then perhaps you could arrange for its output
> to be sent to the log when the system has stabilized after a reboot. That
> would only leave disk hotplug events exposed.

Yes, if we write a new udev rule or script we could place bread crumbs
in the journal so we can correlate
sda == naa.5000cca02b38d0b8 at the time of the error.  However, none of
the existing tooling will allow you to see all the log messages that
pertain to the device easily.  The user is still left with a bunch of
log messages that have different ways to refer to the same device
attachment eg. sda, ata1.00, sd 0:0:0:0.  For them to understand which
messages go with which device is not trivial.  Even if someone writes a
tool to parse the messages, looking for the string that contains the ID
and has the needed decoder information to associate it with the correct
piece of hardware, it's only good until the message changes in the kernel.

If we stuff the defacto ID into the message itself when it occurs, the
ambiguity of what device is associated with a message is removed.

I would like to know, why this is so horrible?  Is it processing time in
an error path?  Stack usage holding the data in flight, wasted disk
space on disk?  Unique identifiers are just too long and terse?

The only valid reason I can think of is someone working with very
sensitive data and not wanting the unique ID of a removable or network
storage device to be in their logs.  Of course we could add a disable
boot time option for that or make the default off for those that don't
want/care.

> Faced with the above medium error I would try:
>    dd if=<all_possibles> bs=512 skip=43578 iflag=direct of=/dev/null
> count=1
> and look for noise in the logs. Change 'bs=512' up to 4096 if that is
> the logical block size. For <all_possibles> use /dev/sde (and /dev/sdf and
> /dev/dev/sdg or whatever) IOWs the _whole_ disk device name.

Assuming the error reproduces, this would work.  However, I think this
solution speaks volumes for how difficult it is to simply identify what
device an error originated from.

-Tony


      reply index

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-24 17:16 Tony Asleson
2020-07-24 17:16 ` [v4 01/11] struct device: Add function callback durable_name Tony Asleson
2020-07-24 17:16 ` [v4 02/11] create_syslog_header: Add durable name Tony Asleson
2020-07-24 17:16 ` [v4 03/11] dev_vprintk_emit: Increase hdr size Tony Asleson
2020-07-25 10:05   ` Andy Shevchenko
2020-07-24 17:16 ` [v4 04/11] scsi: Add durable_name for dev_printk Tony Asleson
2020-07-25 10:20   ` Andy Shevchenko
2020-07-24 17:17 ` [v4 05/11] nvme: Add durable name " Tony Asleson
2020-07-25  9:05   ` Sergei Shtylyov
2020-07-25 10:23   ` Andy Shevchenko
2020-07-24 17:17 ` [v4 06/11] libata: Add ata_scsi_durable_name Tony Asleson
2020-07-25  0:48   ` kernel test robot
2020-07-25  1:07   ` kernel test robot
2020-07-25  1:07   ` [RFC PATCH] libata: ata_scsi_durable_name() can be static kernel test robot
2020-07-25  1:18   ` [v4 06/11] libata: Add ata_scsi_durable_name kernel test robot
2020-07-25 10:26   ` Andy Shevchenko
2020-07-24 17:17 ` [v4 07/11] Add durable_name_printk Tony Asleson
2020-07-24 17:17 ` [v4 08/11] libata: use durable_name_printk Tony Asleson
2020-07-24 17:17 ` [v4 09/11] Add durable_name_printk_ratelimited Tony Asleson
2020-07-24 17:17 ` [v4 10/11] print_req_error: Use durable_name_printk_ratelimited Tony Asleson
2020-07-25  9:15   ` Sergei Shtylyov
2020-07-24 17:17 ` [v4 11/11] buffer_io_error: " Tony Asleson
2020-07-25 10:29   ` Andy Shevchenko
2020-07-26 15:10 ` [v4 00/11] Add persistent durable identifier to storage log messages Christoph Hellwig
2020-07-27 15:45   ` Tony Asleson
2020-07-27 16:46     ` Hannes Reinecke
2020-07-27 17:42       ` Tony Asleson
2020-07-27 19:17         ` Douglas Gilbert
2020-07-27 20:27           ` Tony Asleson [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6973893a-eda8-6128-b484-7c89c1dc5070@redhat.com \
    --to=tasleson@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=b.zolnierkie@samsung.com \
    --cc=dgilbert@interlog.com \
    --cc=hare@suse.de \
    --cc=hch@infradead.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-ide@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-ide Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-ide/0 linux-ide/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-ide linux-ide/ https://lore.kernel.org/linux-ide \
		linux-ide@vger.kernel.org
	public-inbox-index linux-ide

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-ide


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git