linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* RE: RAID1 + LVM not detected during boot on 2.6.9
@ 2004-12-15 16:40 Stephen Warren
  2004-12-15 19:41 ` bug in sym53c8xx? [Was: RAID1 + LVM not detected during boot on 2.6.9] Aleksandar Milivojevic
  0 siblings, 1 reply; 4+ messages in thread
From: Stephen Warren @ 2004-12-15 16:40 UTC (permalink / raw)
  To: Aleksandar Milivojevic, Linux Kernel Mailing List

From: linux-kernel-owner@vger.kernel.org 
> I've installed one machine (Fedora Core 3 distro) with /boot 
> on  RAID1 device (md0) and all other filesystems on LVM
> volumes located on another  RAID1 device (md1).  There was
> only one volume group, with couple of volumes for file
> systems (one of them was root file system).

I have this exact same setup, and it's working great.

You do have the correct partition types setup, right? The underlying
RAID partitions should be type 0xfd (Linux raid autodetect). Also, where
are your disks attached - are you really sure that the kernel has
drivers for your host controller in the initrd - perhaps you should edit
the linuxrc (I think) script file to cat the content of some /proc files
to prove that the disks are known to the kernel. Perhaps even add sfdisk
to the initrd, and have it dump out the partition tables etc. at boot
time.

For example, fdisk says this about one of my disks:

SEVERN:~$ sudo fdisk /dev/hdg

The number of cylinders for this disk is set to 30515.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
   (e.g., DOS FDISK, OS/2 FDISK)

Command (m for help): 

Disk /dev/hdg: 251.0 GB, 251000193024 bytes
255 heads, 63 sectors/track, 30515 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/hdg1   *           1          13      104391   fd  Linux raid
autodetect
/dev/hdg2              14       30226   242685922+  fd  Linux raid
autodetect
/dev/hdg3           30227       30357     1052257+  82  Linux swap

-- 
Stephen Warren, Software Engineer, NVIDIA, Fort Collins, CO
swarren@nvidia.com        http://www.nvidia.com/
swarren@wwwdotorg.org     http://www.wwwdotorg.org/pgp.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

* bug in sym53c8xx? [Was: RAID1 + LVM not detected during boot on 2.6.9]
  2004-12-15 16:40 RAID1 + LVM not detected during boot on 2.6.9 Stephen Warren
@ 2004-12-15 19:41 ` Aleksandar Milivojevic
  2004-12-16  5:05   ` Alexander E. Patrakov
  0 siblings, 1 reply; 4+ messages in thread
From: Aleksandar Milivojevic @ 2004-12-15 19:41 UTC (permalink / raw)
  To: Linux Kernel Mailing List

Stephen Warren wrote:
> From: linux-kernel-owner@vger.kernel.org 
> 
>>I've installed one machine (Fedora Core 3 distro) with /boot 
>>on  RAID1 device (md0) and all other filesystems on LVM
>>volumes located on another  RAID1 device (md1).  There was
>>only one volume group, with couple of volumes for file
>>systems (one of them was root file system).
> 
> You do have the correct partition types setup, right? The underlying
> RAID partitions should be type 0xfd (Linux raid autodetect). Also, where
> are your disks attached - are you really sure that the kernel has
> drivers for your host controller in the initrd

I've did a bit of troubleshooting and found where the problem was.

The problem seems to be that sym53c8xx is "slow" in detecting disks 
connected to the SCSI controller.

The timeline (during normal boot) looks something like this:

  - sym53c8xx is loaded and starts detecting disks
  - raid1 and dm-* modules are loaded
  - raidautorun and lvm vgscan are executed

raid1 module doesn't find anything since sym53c8xx hasn't yet reported 
any disk drives.

If I insert sleep 30 (shorter value would probably work too) after 
"insmod sym53c8xx" line in init script, and than reboot, everything 
works.  sym53c8xx has enough time to find the disk drives, so when next 
steps are taken (loading of raid1 and dm-* modules, and execution of 
raidautorun and lvm vgscan) they are there.

I'm not sure if insmod was supposed to wait until driver initializes?

In the 2.4.x kernel days, I remember there was different driver used for 
this SCSI card (Symbios Logic 53c1010 Ultra3 SCSI Adapter).  It hasn't 
suffered from this problem (it detects disks fast enough so that 
subsequent loading/initialization of raid1 works).

The question is if this is:

   - bug in sym53c8xx driver?
   - bug in insmod?
   - bug in init script built by mkinitrd (missing sleep)?
   - bug in design of initrd?

If this might be a bug in sym53c8xx, let me know, and I'll file the bug 
into bugzilla.

Note about hardware (if somebody attempts to reproduce the problem):

The SCSI controller in question is integrated onto dual P-III 
motherboard (Asus CUV4X-DLS, only one CPU installed currently, runing 
single processor kernel).  There are two of them on the motherboard. 
First SCSI controller doesn't function properly, so two disk drives are 
connected to the second SCSI controller.  First SCSI controller is 
disabled in Symbios BIOS (but it seems that Linux doesn't care about 
that).  All other BIOS settings are set to default values.  Yeah, I know 
there's some faulty hardware involed, but I can't rip it out from the 
motherboard, Linux ignores the fact it is disabled, and there's nothing 
connected to it.  Plus, the old 2.4.x driver was able to handle it 
without any problems.

-- 
Aleksandar Milivojevic <amilivojevic@pbl.ca>    Pollard Banknote Limited
Systems Administrator                           1499 Buffalo Place
Tel: (204) 474-2323 ext 276                     Winnipeg, MB  R3T 1L7

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: bug in sym53c8xx? [Was: RAID1 + LVM not detected during boot on 2.6.9]
  2004-12-15 19:41 ` bug in sym53c8xx? [Was: RAID1 + LVM not detected during boot on 2.6.9] Aleksandar Milivojevic
@ 2004-12-16  5:05   ` Alexander E. Patrakov
  2004-12-16 18:20     ` Aleksandar Milivojevic
  0 siblings, 1 reply; 4+ messages in thread
From: Alexander E. Patrakov @ 2004-12-16  5:05 UTC (permalink / raw)
  To: linux-kernel

Aleksandar Milivojevic wrote:

> The timeline (during normal boot) looks something like this:
> 
>   - sym53c8xx is loaded and starts detecting disks
>   - raid1 and dm-* modules are loaded
>   - raidautorun and lvm vgscan are executed
> 
> raid1 module doesn't find anything since sym53c8xx hasn't yet reported
> any disk drives.
> 
> If I insert sleep 30 (shorter value would probably work too) after
> "insmod sym53c8xx" line in init script, and than reboot, everything
> works.  sym53c8xx has enough time to find the disk drives, so when next
> steps are taken (loading of raid1 and dm-* modules, and execution of
> raidautorun and lvm vgscan) they are there.
> 
> I'm not sure if insmod was supposed to wait until driver initializes?
> 
> In the 2.4.x kernel days, I remember there was different driver used for
> this SCSI card (Symbios Logic 53c1010 Ultra3 SCSI Adapter).  It hasn't
> suffered from this problem (it detects disks fast enough so that
> subsequent loading/initialization of raid1 works).
> 
> The question is if this is:
> 
>    - bug in sym53c8xx driver?
>    - bug in insmod?
>    - bug in init script built by mkinitrd (missing sleep)?
>    - bug in design of initrd?
> 
> If this might be a bug in sym53c8xx, let me know, and I'll file the bug
> into bugzilla.

I am not sure how to classify this bug properly. 

First, the driver correctly follows the behaviour described by Greg KH: it
should load successfully even if there is no corresonding hardware. Then
(in already-loaded state) it should generate hotplug events when the
hardware says it's present (and that's slow). Greg KH says that in hotplug
world (i.e., in reality) nothing else is possible. From this viewpoint, the
bug is in the linuxrc script provided by RedHat. It should really either
poll and sleep and wait or use udev to get notification when the disk is
really accessible.

Second, a similar problem has been discussed on LKML in September, under the
title "udev is too slow creating devices". While originally concerned with
asynchronicity due to udev itself, this therad also touches the idea that
udev doesn't really add asynchronicity, since PCI devices are really
hot-pluggable and the bus is already asynchronous. The thread starts here:

http://lkml.org/lkml/2004/9/14/298

and continues here:

http://lkml.org/lkml/2004/9/18/89

In this thread, there were the following words by Benjamin Herrenschmidt:

> Nope, Greg is right. Drivers themselves won't necessarily provide
> you with the device interface in a synchronous way after they are
> loaded, and some will certainly never. It is all an asynchronous process
> and there is simply no way to ask for any kind of enforced synchronicity
> here without major bloatage.

However, we _do_ need this synchronicity, and therefore _have_ to live with
major bloatage either in the kernel (currently that's not there) or in the
userspace (your "sleep for a magic number of seconds" here and there in
linuxrc and bootscripts). My opinion is that the needed userspace bloatage
is not centralized, also it's hard to audit, and therefore the issue of
supporting the statement "This piece of hardware must be there, it's not
really hotpluggable" must be dealt with.

P.S.: FreeBSD explicitly sleeps 15 seconds for SCSI devices to settle.

-- 
Alexander E. Patrakov


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: bug in sym53c8xx? [Was: RAID1 + LVM not detected during boot on 2.6.9]
  2004-12-16  5:05   ` Alexander E. Patrakov
@ 2004-12-16 18:20     ` Aleksandar Milivojevic
  0 siblings, 0 replies; 4+ messages in thread
From: Aleksandar Milivojevic @ 2004-12-16 18:20 UTC (permalink / raw)
  To: Alexander E. Patrakov; +Cc: linux-kernel

Alexander E. Patrakov wrote:
> I am not sure how to classify this bug properly. 
> 
> First, the driver correctly follows the behaviour described by Greg KH: it
> should load successfully even if there is no corresonding hardware. Then
> (in already-loaded state) it should generate hotplug events when the
> hardware says it's present (and that's slow). Greg KH says that in hotplug
> world (i.e., in reality) nothing else is possible. From this viewpoint, the
> bug is in the linuxrc script provided by RedHat. It should really either
> poll and sleep and wait or use udev to get notification when the disk is
> really accessible.

I've just stumbled at another problem that seems to boil down to the 
same thing.  This time modules were not PCI related.  They were file 
system modules: jbd and ext3.

The ext3.ko needs some symbols that are defined in jbd.ko.  The linuxrc 
(init) script first loads jbd.ko, and than ext3.ko.  ext3.ko complains 
about unknown symbols (journal_*) that are defined in jbd.ko and fails 
to load.  Again, inserting sleep between invocations of insmod solves 
the problem.

The problem was first referenced on Fedora Users mailing list, under 
thread "FC3 SMP builds do NOT contain ext3 drivers in the build!!!!" 
(kind of incorrect subject line, ext3 driver was included, but it failed 
to load due to unknown symbols).  It seems that OP migrated his file 
systems back to ext2 in order to be able to boot.  There were couple of 
people that experienced this race condition problem.

Now, this might be the linuxrc (init) script problem.  But if in order 
to boot reliably we need to add "sleep 10" lines after each and every 
module from initrd image is loaded, it becomes ridicilus.  Shouldn't 
there be a way to load module and wait until it signals "I'm done with 
initialization" before insmod command exits?  This would solve problems 
with hot-pluggable devices too (driver would scan the bus, than signal 
it is initialized, and after that it would continue doing its 
hot-pluggable agenda).

-- 
Aleksandar Milivojevic <amilivojevic@pbl.ca>    Pollard Banknote Limited
Systems Administrator                           1499 Buffalo Place
Tel: (204) 474-2323 ext 276                     Winnipeg, MB  R3T 1L7

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2004-12-16 18:21 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-12-15 16:40 RAID1 + LVM not detected during boot on 2.6.9 Stephen Warren
2004-12-15 19:41 ` bug in sym53c8xx? [Was: RAID1 + LVM not detected during boot on 2.6.9] Aleksandar Milivojevic
2004-12-16  5:05   ` Alexander E. Patrakov
2004-12-16 18:20     ` Aleksandar Milivojevic

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).