All of lore.kernel.org
 help / color / mirror / Atom feed
* Improper Naming in /dev/disk/by-id and Drives Offline
@ 2014-09-11  2:34 Brandon R Schwartz
  2014-09-11  2:53 ` Greg KH
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: Brandon R Schwartz @ 2014-09-11  2:34 UTC (permalink / raw)
  To: linux-hotplug

Hi,

I'm working on a particular issue (possibly two separate issues) where
our HDDs are (1) getting mislabeled in /dev/disk/by-id and (2)
dropping offline even though drive and controller logs show that the
drive is communicating and working as expected.  I don't have much
knowledge on the udev side of things so it would be great if someone
could offer some insight into the way udev assigns device names and if
there are thoughts as to why the OS cannot see the drive in certain
cases (timing issue?).

The first issue, the mislabeling problem, is that on reboots or power
cycles we occasionally see our drives become mislabeled in
/dev/disk/by-id.  We expect to see something like:

ata-ST3000DM001-1CH166_W1F26HKK
ata-ST3000DM001-1CH166_Z1F2FBBY

But instead we see:

ata-ST3000DM001-1CH166_W1F26HKK
scsi-35000c500668a9bdb

The "scsi" drive is assigned a drive letter and the OS can communicate
with the drive.  Drives logs and controller logs show the drive is
working properly, but for some reason it's getting labeled incorrectly
in /dev/disk/by-id.  We have looked through dmesg and enabled logging
in udev (udevadm control --log-priorityfibug), but we have not seen
where these labels are coming from.

The second issue is slightly related to the first in that it appears
during the same power cycle/reboot test.  We have noticed that on
occasion, our drives will not be detected by the OS (not listed in
/dev/disk/by-id) at all.  However, if we look at drive logs and
controller logs, we don't see any issue.  The controller is able to
see the drives and communicate with them, but the OS is unable to.
Any ideas as to why communication is not established?

Also, is there a way to refresh the /dev/disk/by-id listing (udevadm
trigger?) once the OS has booted in order to rescan for attached
devices and repopulate it?  Thanks for any information and let me know
if you need logs or anything else.

Regards,
Brandon

-- 
Brandon Schwartz
--
To unsubscribe from this list: send the line "unsubscribe linux-hotplug" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Improper Naming in /dev/disk/by-id and Drives Offline
  2014-09-11  2:34 Improper Naming in /dev/disk/by-id and Drives Offline Brandon R Schwartz
@ 2014-09-11  2:53 ` Greg KH
  2014-09-12 17:52 ` Brandon R Schwartz
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Greg KH @ 2014-09-11  2:53 UTC (permalink / raw)
  To: linux-hotplug

On Wed, Sep 10, 2014 at 08:34:06PM -0600, Brandon R Schwartz wrote:
> Hi,
> 
> I'm working on a particular issue (possibly two separate issues) where
> our HDDs are (1) getting mislabeled in /dev/disk/by-id and (2)
> dropping offline even though drive and controller logs show that the
> drive is communicating and working as expected.  I don't have much
> knowledge on the udev side of things so it would be great if someone
> could offer some insight into the way udev assigns device names and if
> there are thoughts as to why the OS cannot see the drive in certain
> cases (timing issue?).
> 
> The first issue, the mislabeling problem, is that on reboots or power
> cycles we occasionally see our drives become mislabeled in
> /dev/disk/by-id.  We expect to see something like:
> 
> ata-ST3000DM001-1CH166_W1F26HKK
> ata-ST3000DM001-1CH166_Z1F2FBBY
> 
> But instead we see:
> 
> ata-ST3000DM001-1CH166_W1F26HKK
> scsi-35000c500668a9bdb
> 
> The "scsi" drive is assigned a drive letter and the OS can communicate
> with the drive.  Drives logs and controller logs show the drive is
> working properly, but for some reason it's getting labeled incorrectly
> in /dev/disk/by-id.  We have looked through dmesg and enabled logging
> in udev (udevadm control --log-priorityÞbug), but we have not seen
> where these labels are coming from.

Sounds like blkid didn't read the uuid properly.  Is this happening in
your initrd?  Is this a systemd init system, or something else?  What
distro / version is this?  What kernel version is this?

> The second issue is slightly related to the first in that it appears
> during the same power cycle/reboot test.  We have noticed that on
> occasion, our drives will not be detected by the OS (not listed in
> /dev/disk/by-id) at all.  However, if we look at drive logs and
> controller logs, we don't see any issue.  The controller is able to
> see the drives and communicate with them, but the OS is unable to.
> Any ideas as to why communication is not established?
> 
> Also, is there a way to refresh the /dev/disk/by-id listing (udevadm
> trigger?) once the OS has booted in order to rescan for attached
> devices and repopulate it?  Thanks for any information and let me know
> if you need logs or anything else.

That depends on your distro, and how it's set up.  You could "coldplug"
the by-id values by using udevadmn trigger, have you tried that?  You
shouldn't have to do it, as it sounds like you have a boot time race
condition somewhere...

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Improper Naming in /dev/disk/by-id and Drives Offline
  2014-09-11  2:34 Improper Naming in /dev/disk/by-id and Drives Offline Brandon R Schwartz
  2014-09-11  2:53 ` Greg KH
@ 2014-09-12 17:52 ` Brandon R Schwartz
  2014-09-12 18:03 ` Greg KH
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Brandon R Schwartz @ 2014-09-12 17:52 UTC (permalink / raw)
  To: linux-hotplug

On Wed, Sep 10, 2014 at 8:53 PM, Greg KH <greg@kroah.com> wrote:
> On Wed, Sep 10, 2014 at 08:34:06PM -0600, Brandon R Schwartz wrote:
>> Hi,
>>
>> I'm working on a particular issue (possibly two separate issues) where
>> our HDDs are (1) getting mislabeled in /dev/disk/by-id and (2)
>> dropping offline even though drive and controller logs show that the
>> drive is communicating and working as expected.  I don't have much
>> knowledge on the udev side of things so it would be great if someone
>> could offer some insight into the way udev assigns device names and if
>> there are thoughts as to why the OS cannot see the drive in certain
>> cases (timing issue?).
>>
>> The first issue, the mislabeling problem, is that on reboots or power
>> cycles we occasionally see our drives become mislabeled in
>> /dev/disk/by-id.  We expect to see something like:
>>
>> ata-ST3000DM001-1CH166_W1F26HKK
>> ata-ST3000DM001-1CH166_Z1F2FBBY
>>
>> But instead we see:
>>
>> ata-ST3000DM001-1CH166_W1F26HKK
>> scsi-35000c500668a9bdb
>>
>> The "scsi" drive is assigned a drive letter and the OS can communicate
>> with the drive.  Drives logs and controller logs show the drive is
>> working properly, but for some reason it's getting labeled incorrectly
>> in /dev/disk/by-id.  We have looked through dmesg and enabled logging
>> in udev (udevadm control --log-priorityÞbug), but we have not seen
>> where these labels are coming from.
>
> Sounds like blkid didn't read the uuid properly.  Is this happening in
> your initrd?  Is this a systemd init system, or something else?  What
> distro / version is this?  What kernel version is this?
>

Hi Greg,

The distro is RHEL 6.3 with kernel version 2.6.32.  We have also seen
the issue on a Debian based system with kernel  3.2.45.  We ran into
this issue again yesterday on RHEL and tested the command 'udevadm
trigger' and it repopulated /dev/disk/by-id with the correct
information.  Is there another level of debugging that we can enable
to see where the information might be getting read improperly?

>> The second issue is slightly related to the first in that it appears
>> during the same power cycle/reboot test.  We have noticed that on
>> occasion, our drives will not be detected by the OS (not listed in
>> /dev/disk/by-id) at all.  However, if we look at drive logs and
>> controller logs, we don't see any issue.  The controller is able to
>> see the drives and communicate with them, but the OS is unable to.
>> Any ideas as to why communication is not established?
>>
>> Also, is there a way to refresh the /dev/disk/by-id listing (udevadm
>> trigger?) once the OS has booted in order to rescan for attached
>> devices and repopulate it?  Thanks for any information and let me know
>> if you need logs or anything else.
>
> That depends on your distro, and how it's set up.  You could "coldplug"
> the by-id values by using udevadmn trigger, have you tried that?  You
> shouldn't have to do it, as it sounds like you have a boot time race
> condition somewhere...

What do you mean by 'coldplug' the by-id values with udevadm trigger?
This issue happens much more infrequently so we are still waiting for
a failure to test.  We are also looking into ways that we can
exacerbate the issue if it is a boot time race condition.

>
> thanks,
>
> greg k-h

Regards,
Brandon

-- 
Brandon Schwartz

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Improper Naming in /dev/disk/by-id and Drives Offline
  2014-09-11  2:34 Improper Naming in /dev/disk/by-id and Drives Offline Brandon R Schwartz
  2014-09-11  2:53 ` Greg KH
  2014-09-12 17:52 ` Brandon R Schwartz
@ 2014-09-12 18:03 ` Greg KH
  2014-09-12 18:53 ` Brandon R Schwartz
  2014-09-12 22:42 ` Greg KH
  4 siblings, 0 replies; 6+ messages in thread
From: Greg KH @ 2014-09-12 18:03 UTC (permalink / raw)
  To: linux-hotplug

On Fri, Sep 12, 2014 at 11:52:30AM -0600, Brandon R Schwartz wrote:
> On Wed, Sep 10, 2014 at 8:53 PM, Greg KH <greg@kroah.com> wrote:
> > On Wed, Sep 10, 2014 at 08:34:06PM -0600, Brandon R Schwartz wrote:
> >> Hi,
> >>
> >> I'm working on a particular issue (possibly two separate issues) where
> >> our HDDs are (1) getting mislabeled in /dev/disk/by-id and (2)
> >> dropping offline even though drive and controller logs show that the
> >> drive is communicating and working as expected.  I don't have much
> >> knowledge on the udev side of things so it would be great if someone
> >> could offer some insight into the way udev assigns device names and if
> >> there are thoughts as to why the OS cannot see the drive in certain
> >> cases (timing issue?).
> >>
> >> The first issue, the mislabeling problem, is that on reboots or power
> >> cycles we occasionally see our drives become mislabeled in
> >> /dev/disk/by-id.  We expect to see something like:
> >>
> >> ata-ST3000DM001-1CH166_W1F26HKK
> >> ata-ST3000DM001-1CH166_Z1F2FBBY
> >>
> >> But instead we see:
> >>
> >> ata-ST3000DM001-1CH166_W1F26HKK
> >> scsi-35000c500668a9bdb
> >>
> >> The "scsi" drive is assigned a drive letter and the OS can communicate
> >> with the drive.  Drives logs and controller logs show the drive is
> >> working properly, but for some reason it's getting labeled incorrectly
> >> in /dev/disk/by-id.  We have looked through dmesg and enabled logging
> >> in udev (udevadm control --log-priorityÞbug), but we have not seen
> >> where these labels are coming from.
> >
> > Sounds like blkid didn't read the uuid properly.  Is this happening in
> > your initrd?  Is this a systemd init system, or something else?  What
> > distro / version is this?  What kernel version is this?
> >
> 
> Hi Greg,
> 
> The distro is RHEL 6.3 with kernel version 2.6.32.

Then I strongly suggest you get support from Red Hat, as you are paying
for it :)

> We have also seen
> the issue on a Debian based system with kernel  3.2.45.  We ran into
> this issue again yesterday on RHEL and tested the command 'udevadm
> trigger' and it repopulated /dev/disk/by-id with the correct
> information.  Is there another level of debugging that we can enable
> to see where the information might be getting read improperly?

I don't know how RHEL is set up at all, it's such an old kernel, and
userspace, the community can't help you out, sorry.

Work with Red Hat, you are paying them, might as well take advantage of
it.

good luck,

greg k-h

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Improper Naming in /dev/disk/by-id and Drives Offline
  2014-09-11  2:34 Improper Naming in /dev/disk/by-id and Drives Offline Brandon R Schwartz
                   ` (2 preceding siblings ...)
  2014-09-12 18:03 ` Greg KH
@ 2014-09-12 18:53 ` Brandon R Schwartz
  2014-09-12 22:42 ` Greg KH
  4 siblings, 0 replies; 6+ messages in thread
From: Brandon R Schwartz @ 2014-09-12 18:53 UTC (permalink / raw)
  To: linux-hotplug

On Fri, Sep 12, 2014 at 12:03 PM, Greg KH <greg@kroah.com> wrote:
> On Fri, Sep 12, 2014 at 11:52:30AM -0600, Brandon R Schwartz wrote:
>> On Wed, Sep 10, 2014 at 8:53 PM, Greg KH <greg@kroah.com> wrote:
>> > On Wed, Sep 10, 2014 at 08:34:06PM -0600, Brandon R Schwartz wrote:
>> >> Hi,
>> >>
>> >> I'm working on a particular issue (possibly two separate issues) where
>> >> our HDDs are (1) getting mislabeled in /dev/disk/by-id and (2)
>> >> dropping offline even though drive and controller logs show that the
>> >> drive is communicating and working as expected.  I don't have much
>> >> knowledge on the udev side of things so it would be great if someone
>> >> could offer some insight into the way udev assigns device names and if
>> >> there are thoughts as to why the OS cannot see the drive in certain
>> >> cases (timing issue?).
>> >>
>> >> The first issue, the mislabeling problem, is that on reboots or power
>> >> cycles we occasionally see our drives become mislabeled in
>> >> /dev/disk/by-id.  We expect to see something like:
>> >>
>> >> ata-ST3000DM001-1CH166_W1F26HKK
>> >> ata-ST3000DM001-1CH166_Z1F2FBBY
>> >>
>> >> But instead we see:
>> >>
>> >> ata-ST3000DM001-1CH166_W1F26HKK
>> >> scsi-35000c500668a9bdb
>> >>
>> >> The "scsi" drive is assigned a drive letter and the OS can communicate
>> >> with the drive.  Drives logs and controller logs show the drive is
>> >> working properly, but for some reason it's getting labeled incorrectly
>> >> in /dev/disk/by-id.  We have looked through dmesg and enabled logging
>> >> in udev (udevadm control --log-priorityÞbug), but we have not seen
>> >> where these labels are coming from.
>> >
>> > Sounds like blkid didn't read the uuid properly.  Is this happening in
>> > your initrd?  Is this a systemd init system, or something else?  What
>> > distro / version is this?  What kernel version is this?
>> >
>>
>> Hi Greg,
>>
>> The distro is RHEL 6.3 with kernel version 2.6.32.
>
> Then I strongly suggest you get support from Red Hat, as you are paying
> for it :)
>
>> We have also seen
>> the issue on a Debian based system with kernel  3.2.45.  We ran into
>> this issue again yesterday on RHEL and tested the command 'udevadm
>> trigger' and it repopulated /dev/disk/by-id with the correct
>> information.  Is there another level of debugging that we can enable
>> to see where the information might be getting read improperly?
>
> I don't know how RHEL is set up at all, it's such an old kernel, and
> userspace, the community can't help you out, sorry.

Haha, that is true, but we do see the failures more often on the
Debian based system.  If you think we'd be better off working with the
RHEL community or the Debian forums we'll try our luck there.  Thanks
for all the help so far!

>
> Work with Red Hat, you are paying them, might as well take advantage of
> it.
>
> good luck,
>
> greg k-h

Regards,
Brandon

-- 
Brandon Schwartz

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Improper Naming in /dev/disk/by-id and Drives Offline
  2014-09-11  2:34 Improper Naming in /dev/disk/by-id and Drives Offline Brandon R Schwartz
                   ` (3 preceding siblings ...)
  2014-09-12 18:53 ` Brandon R Schwartz
@ 2014-09-12 22:42 ` Greg KH
  4 siblings, 0 replies; 6+ messages in thread
From: Greg KH @ 2014-09-12 22:42 UTC (permalink / raw)
  To: linux-hotplug

On Fri, Sep 12, 2014 at 12:53:23PM -0600, Brandon R Schwartz wrote:
> On Fri, Sep 12, 2014 at 12:03 PM, Greg KH <greg@kroah.com> wrote:
> > On Fri, Sep 12, 2014 at 11:52:30AM -0600, Brandon R Schwartz wrote:
> >> On Wed, Sep 10, 2014 at 8:53 PM, Greg KH <greg@kroah.com> wrote:
> >> > On Wed, Sep 10, 2014 at 08:34:06PM -0600, Brandon R Schwartz wrote:
> >> >> Hi,
> >> >>
> >> >> I'm working on a particular issue (possibly two separate issues) where
> >> >> our HDDs are (1) getting mislabeled in /dev/disk/by-id and (2)
> >> >> dropping offline even though drive and controller logs show that the
> >> >> drive is communicating and working as expected.  I don't have much
> >> >> knowledge on the udev side of things so it would be great if someone
> >> >> could offer some insight into the way udev assigns device names and if
> >> >> there are thoughts as to why the OS cannot see the drive in certain
> >> >> cases (timing issue?).
> >> >>
> >> >> The first issue, the mislabeling problem, is that on reboots or power
> >> >> cycles we occasionally see our drives become mislabeled in
> >> >> /dev/disk/by-id.  We expect to see something like:
> >> >>
> >> >> ata-ST3000DM001-1CH166_W1F26HKK
> >> >> ata-ST3000DM001-1CH166_Z1F2FBBY
> >> >>
> >> >> But instead we see:
> >> >>
> >> >> ata-ST3000DM001-1CH166_W1F26HKK
> >> >> scsi-35000c500668a9bdb
> >> >>
> >> >> The "scsi" drive is assigned a drive letter and the OS can communicate
> >> >> with the drive.  Drives logs and controller logs show the drive is
> >> >> working properly, but for some reason it's getting labeled incorrectly
> >> >> in /dev/disk/by-id.  We have looked through dmesg and enabled logging
> >> >> in udev (udevadm control --log-priorityÞbug), but we have not seen
> >> >> where these labels are coming from.
> >> >
> >> > Sounds like blkid didn't read the uuid properly.  Is this happening in
> >> > your initrd?  Is this a systemd init system, or something else?  What
> >> > distro / version is this?  What kernel version is this?
> >> >
> >>
> >> Hi Greg,
> >>
> >> The distro is RHEL 6.3 with kernel version 2.6.32.
> >
> > Then I strongly suggest you get support from Red Hat, as you are paying
> > for it :)
> >
> >> We have also seen
> >> the issue on a Debian based system with kernel  3.2.45.  We ran into
> >> this issue again yesterday on RHEL and tested the command 'udevadm
> >> trigger' and it repopulated /dev/disk/by-id with the correct
> >> information.  Is there another level of debugging that we can enable
> >> to see where the information might be getting read improperly?
> >
> > I don't know how RHEL is set up at all, it's such an old kernel, and
> > userspace, the community can't help you out, sorry.
> 
> Haha, that is true, but we do see the failures more often on the
> Debian based system.  If you think we'd be better off working with the
> RHEL community or the Debian forums we'll try our luck there.  Thanks
> for all the help so far!

The "RHEL community" is corporate support, which you are are paying for,
use it!

As for the fact that it seems reproducable on two very different, and
both old, distros, it might be a hardware issue, try using a more
"modern" distro to see if it really is a kernel/udev issue, or hardware.

good luck,

greg k-h

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-09-12 22:42 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-11  2:34 Improper Naming in /dev/disk/by-id and Drives Offline Brandon R Schwartz
2014-09-11  2:53 ` Greg KH
2014-09-12 17:52 ` Brandon R Schwartz
2014-09-12 18:03 ` Greg KH
2014-09-12 18:53 ` Brandon R Schwartz
2014-09-12 22:42 ` Greg KH

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.