From mboxrd@z Thu Jan 1 00:00:00 1970 From: Brandon R Schwartz Date: Fri, 12 Sep 2014 17:52:30 +0000 Subject: Re: Improper Naming in /dev/disk/by-id and Drives Offline Message-Id: List-Id: References: In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable To: linux-hotplug@vger.kernel.org On Wed, Sep 10, 2014 at 8:53 PM, Greg KH wrote: > On Wed, Sep 10, 2014 at 08:34:06PM -0600, Brandon R Schwartz wrote: >> Hi, >> >> I'm working on a particular issue (possibly two separate issues) where >> our HDDs are (1) getting mislabeled in /dev/disk/by-id and (2) >> dropping offline even though drive and controller logs show that the >> drive is communicating and working as expected. I don't have much >> knowledge on the udev side of things so it would be great if someone >> could offer some insight into the way udev assigns device names and if >> there are thoughts as to why the OS cannot see the drive in certain >> cases (timing issue?). >> >> The first issue, the mislabeling problem, is that on reboots or power >> cycles we occasionally see our drives become mislabeled in >> /dev/disk/by-id. We expect to see something like: >> >> ata-ST3000DM001-1CH166_W1F26HKK >> ata-ST3000DM001-1CH166_Z1F2FBBY >> >> But instead we see: >> >> ata-ST3000DM001-1CH166_W1F26HKK >> scsi-35000c500668a9bdb >> >> The "scsi" drive is assigned a drive letter and the OS can communicate >> with the drive. Drives logs and controller logs show the drive is >> working properly, but for some reason it's getting labeled incorrectly >> in /dev/disk/by-id. We have looked through dmesg and enabled logging >> in udev (udevadm control --log-priority=DEbug), but we have not seen >> where these labels are coming from. > > Sounds like blkid didn't read the uuid properly. Is this happening in > your initrd? Is this a systemd init system, or something else? What > distro / version is this? What kernel version is this? > Hi Greg, The distro is RHEL 6.3 with kernel version 2.6.32. We have also seen the issue on a Debian based system with kernel 3.2.45. We ran into this issue again yesterday on RHEL and tested the command 'udevadm trigger' and it repopulated /dev/disk/by-id with the correct information. Is there another level of debugging that we can enable to see where the information might be getting read improperly? >> The second issue is slightly related to the first in that it appears >> during the same power cycle/reboot test. We have noticed that on >> occasion, our drives will not be detected by the OS (not listed in >> /dev/disk/by-id) at all. However, if we look at drive logs and >> controller logs, we don't see any issue. The controller is able to >> see the drives and communicate with them, but the OS is unable to. >> Any ideas as to why communication is not established? >> >> Also, is there a way to refresh the /dev/disk/by-id listing (udevadm >> trigger?) once the OS has booted in order to rescan for attached >> devices and repopulate it? Thanks for any information and let me know >> if you need logs or anything else. > > That depends on your distro, and how it's set up. You could "coldplug" > the by-id values by using udevadmn trigger, have you tried that? You > shouldn't have to do it, as it sounds like you have a boot time race > condition somewhere... What do you mean by 'coldplug' the by-id values with udevadm trigger? This issue happens much more infrequently so we are still waiting for a failure to test. We are also looking into ways that we can exacerbate the issue if it is a boot time race condition. > > thanks, > > greg k-h Regards, Brandon --=20 Brandon Schwartz