linux-lvm.redhat.com archive mirror
 help / color / mirror / Atom feed
* [linux-lvm] LVM 0.8 and reiser filesystem
@ 2000-06-06 15:44 holger_zecha
  2000-06-06 16:21 ` Luca Berra
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: holger_zecha @ 2000-06-06 15:44 UTC (permalink / raw)
  To: linux-lvm

[-- Attachment #1: BDY.RTF --]
[-- Type: text/plain, Size: 769 bytes --]

Hello,

unfortunately, LVM 0.8final (in the following only called LVM) is not
usable in production environments.
We're thinking about using Linux in SAP mixed environments. Now we use
NT as plattform for applikationservers and HP-UX as database server. We
are doing some tests with Linux as Applikation Server in the near
future.
Are you going to implement mirroring in Linux LVM ? Why is it such a big
problem to have the root filesystem on a LVM volume ?
There is no easy way to get mirrored disks with LVM, only without LVM,
but there it's not possible to resize partitions.
So we see these two points as big disadvantages:
a) root Filesystem not on LVM
b) no mirroring.

Are there any solutions in the near future ?

Best regards

Holger Zecha

[-- Attachment #2: BDY.RTF --]
[-- Type: application/rtf, Size: 944 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [linux-lvm] LVM 0.8 and reiser filesystem
  2000-06-06 15:44 [linux-lvm] LVM 0.8 and reiser filesystem holger_zecha
@ 2000-06-06 16:21 ` Luca Berra
  2000-06-06 16:29 ` Brian Kress
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 13+ messages in thread
From: Luca Berra @ 2000-06-06 16:21 UTC (permalink / raw)
  To: linux-lvm

On Tue, Jun 06, 2000 at 05:44:21PM +0200, holger_zecha@hp.com wrote:
> Are you going to implement mirroring in Linux LVM ? Why is it such a big problem to have the root filesystem on a LVM volume ?
> There is no easy way to get mirrored disks with LVM, only without LVM, but there it's not possible to resize partitions.

You can layer lvm above linux-raid, so you get both mirroring, or any other
raid level for that matter and lvm toghether.

It is not a big problem to have root on LVM, the problem is having
the kernel on a LV
just create a normal partition and put the kernel there,
use an initrd to start lvm before mounting root.
the reason for this is that so the kernel is not bloated with code
that should live in userspace (vgscan, vgchange)

L.

-- 
Luca Berra -- bluca@comedia.it
    Communication Media & Services S.r.l.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [linux-lvm] LVM 0.8 and reiser filesystem
  2000-06-06 15:44 [linux-lvm] LVM 0.8 and reiser filesystem holger_zecha
  2000-06-06 16:21 ` Luca Berra
@ 2000-06-06 16:29 ` Brian Kress
  2000-06-06 16:41 ` Andi Kleen
  2000-06-06 17:33 ` Eric M. Hopper
  3 siblings, 0 replies; 13+ messages in thread
From: Brian Kress @ 2000-06-06 16:29 UTC (permalink / raw)
  To: holger_zecha; +Cc: linux-lvm

holger_zecha@hp.com wrote:
> 
> Are you going to implement mirroring in Linux LVM ? 

	You can get mirroring using the MD driver.  You can then
either use the MD device, or you can put LVM on top of that, in 
which case all your LVs are mirrored.

> Why is it such a big problem to have the root filesystem on a LVM volume ?

	This is a limit of the somewhat braindead PC architure.
Since it doesn't have a proper firmware, the boot loader
has to know where to find the kernel.  Hence it can only find
the kernel in data structures it knows about.
	If all you are looking for is a mirrored boot partition,
the latest versions of LILO (the Linux boot loader most commonly
used) can boot off of mirrored disks.

> So we see these two points as big disadvantages:
> a) root Filesystem not on LVM

	Well, techinically you can put root in LVM (it requires
fiddling with initrd and it probably not worth it).  It's /boot
(or wherever you put the kernel) that needs to be outside LVM.

> b) no mirroring.

	Use MD.


Brian Kress
kressb@icp.siemens.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [linux-lvm] LVM 0.8 and reiser filesystem
  2000-06-06 15:44 [linux-lvm] LVM 0.8 and reiser filesystem holger_zecha
  2000-06-06 16:21 ` Luca Berra
  2000-06-06 16:29 ` Brian Kress
@ 2000-06-06 16:41 ` Andi Kleen
  2000-06-07 12:00   ` Luca Berra
  2000-06-06 17:33 ` Eric M. Hopper
  3 siblings, 1 reply; 13+ messages in thread
From: Andi Kleen @ 2000-06-06 16:41 UTC (permalink / raw)
  To: holger_zecha; +Cc: linux-lvm

On Tue, Jun 06, 2000 at 05:44:21PM +0200, holger_zecha@hp.com wrote:
> Hello,
> 
> unfortunately, LVM 0.8final (in the following only called LVM) is not
> usable in production environments.
> We're thinking about using Linux in SAP mixed environments. Now we use
> NT as plattform for applikationservers and HP-UX as database server. We
> are doing some tests with Linux as Applikation Server in the near
> future.
> Are you going to implement mirroring in Linux LVM ? Why is it such a big
> problem to have the root filesystem on a LVM volume ?
> There is no easy way to get mirrored disks with LVM, only without LVM,
> but there it's not possible to resize partitions.

Actually it is possible to run LVM on top of a raid1 MD device (or the 
other way round) 

> So we see these two points as big disadvantages:
> a) root Filesystem not on LVM

This is possible with some effort (generate a initrd that contains LVM
tools and initialise LVM before switching to real root). Only thing
you cannot put onto the LVM is the kernel itself; it needs to be 
on a separate partition. It is possible to make the kernel loading
redundant too over multiple partitions by using the appropiate lilo 
configurations and a watchdog.
[So in short it is possible, but not pretty]

On a real production system you probably should not use software RAID1
or RAID5 though. It is unreliable in the crash case though because
it does not support data logging. In this case a hardware RAID controller
is the better alternative. Of course you can run LVM on top of it.

> b) no mirroring.

Try next time some more research before getting to such conclusions ? 


-Andi

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [linux-lvm] LVM 0.8 and reiser filesystem
  2000-06-06 15:44 [linux-lvm] LVM 0.8 and reiser filesystem holger_zecha
                   ` (2 preceding siblings ...)
  2000-06-06 16:41 ` Andi Kleen
@ 2000-06-06 17:33 ` Eric M. Hopper
  3 siblings, 0 replies; 13+ messages in thread
From: Eric M. Hopper @ 2000-06-06 17:33 UTC (permalink / raw)
  To: holger_zecha; +Cc: linux-lvm

[-- Attachment #1: Type: text/plain, Size: 1988 bytes --]

On Tue, Jun 06, 2000 at 05:44:21PM +0200, holger_zecha@hp.com wrote:
> Hello,
> 
> unfortunately, LVM 0.8final (in the following only called LVM) is not
> usable in production environments.
> We're thinking about using Linux in SAP mixed environments. Now we use
> NT as plattform for applikationservers and HP-UX as database server. We
> are doing some tests with Linux as Applikation Server in the near
> future.
> Are you going to implement mirroring in Linux LVM ? Why is it such a big
> problem to have the root filesystem on a LVM volume ?
> There is no easy way to get mirrored disks with LVM, only without LVM,
> but there it's not possible to resize partitions.
> So we see these two points as big disadvantages:
> a) root Filesystem not on LVM
> b) no mirroring.
> 
> Are there any solutions in the near future ?

	I have no idea, but here are a few thoughts you may find useful.

	I have a strong suspicion that have a root partition under LVM
would actually be kind of easy.  There are two issues, one is booting
from an LVM partition, the other is having a root partition under LVM.
I think these two are actually seperate issues, though they're often
confused.

	I think you could get a root partition under LVM by using initrd
carefully and doing the vgscan and vgchange from a ramdisk.  I haven't
tested this, but I will when I get a chance.

	As for the mirroring thing, why do you want LVM to do this?  Why
not just have a hardware RAID 0 array?  Are you wanting to do periodic
disk image dumps to tape or something?

	If I'm not understanding the mirroring thing, tell me how you'd
do it without LVM.

Have fun (if at all possible),
-- 
Its name is Public Opinion.  It is held in reverence. It settles everything.
Some think it is the voice of God.  Loyalty to petrified opinion never yet
broke a chain or freed a human soul.     ---Mark Twain
-- Eric Hopper (hopper@omnifarious.mn.org http://www.omnifarious.org/~hopper) --

[-- Attachment #2: Type: application/pgp-signature, Size: 232 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [linux-lvm] LVM 0.8 and reiser filesystem
  2000-06-06 16:41 ` Andi Kleen
@ 2000-06-07 12:00   ` Luca Berra
  2000-06-07 12:59     ` Andi Kleen
  2000-06-07 13:53     ` Eric M. Hopper
  0 siblings, 2 replies; 13+ messages in thread
From: Luca Berra @ 2000-06-07 12:00 UTC (permalink / raw)
  To: linux-lvm

On Tue, Jun 06, 2000 at 06:41:38PM +0200, Andi Kleen wrote:
> On a real production system you probably should not use software RAID1
> or RAID5 though. It is unreliable in the crash case though because
> it does not support data logging. In this case a hardware RAID controller
> is the better alternative. Of course you can run LVM on top of it.
I fail to get your point, what makes hw raid more reliable than sw raid?
why are you saying that sw raid is unreliable.

L.

-- 
Luca Berra -- bluca@comedia.it
    Communication Media & Services S.r.l.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [linux-lvm] LVM 0.8 and reiser filesystem
  2000-06-07 12:00   ` Luca Berra
@ 2000-06-07 12:59     ` Andi Kleen
  2000-06-07 15:34       ` Luca Berra
  2000-06-07 16:04       ` Jos Visser
  2000-06-07 13:53     ` Eric M. Hopper
  1 sibling, 2 replies; 13+ messages in thread
From: Andi Kleen @ 2000-06-07 12:59 UTC (permalink / raw)
  To: linux-lvm

On Wed, Jun 07, 2000 at 02:00:43PM +0200, Luca Berra wrote:
> On Tue, Jun 06, 2000 at 06:41:38PM +0200, Andi Kleen wrote:
> > On a real production system you probably should not use software RAID1
> > or RAID5 though. It is unreliable in the crash case though because
> > it does not support data logging. In this case a hardware RAID controller
> > is the better alternative. Of course you can run LVM on top of it.
> I fail to get your point, what makes hw raid more reliable than sw raid?
> why are you saying that sw raid is unreliable.

RAID1 and RAID5 require atomic update of several blocks (parity or mirror
blocks). If the machine crashes inbetween writing such an atomic update
it gets inconsistent.

In RAID5 that is very bad (e.g. when the parity block is not uptodate
and another block is unreadable) you get silent data corruption. In
RAID1 with a slave device you at worst get oudated data (may cause
problems with journaled file systems or programs that fsync/O_SYNC
really guarantee stable on disk storage). raidcheck can fix that in
a lot of cases, but not in all: sometimes it cannot decide if a 
block contains old or new data. 

Hardware RAID usually avoids the problem by using a battery backed 
log device for atomic updates. Software Raid could do the same
by logging block updates in a log (e.g. together with the journaled
file system), but that is not implemented in Linux ATM. It would
also be a severe performance hit.


-Andi 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [linux-lvm] LVM 0.8 and reiser filesystem
  2000-06-07 12:00   ` Luca Berra
  2000-06-07 12:59     ` Andi Kleen
@ 2000-06-07 13:53     ` Eric M. Hopper
  1 sibling, 0 replies; 13+ messages in thread
From: Eric M. Hopper @ 2000-06-07 13:53 UTC (permalink / raw)
  To: linux-lvm

[-- Attachment #1: Type: text/plain, Size: 1907 bytes --]

On Wed, Jun 07, 2000 at 02:00:43PM +0200, Luca Berra wrote:
> On Tue, Jun 06, 2000 at 06:41:38PM +0200, Andi Kleen wrote:
>> On a real production system you probably should not use software
>> RAID1 or RAID5 though. It is unreliable in the crash case though
>> because it does not support data logging. In this case a hardware
>> RAID controller is the better alternative. Of course you can run LVM
>> on top of it.
> 
> I fail to get your point, what makes hw raid more reliable than sw
> raid?  why are you saying that sw raid is unreliable.

	Because in RAID1 or 5 (mirroring, or striping with parity), you
have to write both mirrored sectors, or a sector and its parity sector
in one transaction.  Both must be at least attempted.  If one fails,
that drive needs to be flagged as bad.

	In software RAID, software failures can cause this not to
happen.  The kernel might panic at the wrong time, the power might go
out, etc, etc.  If you do it in hardware, you can use capacitors to make
sure the hardware stays up long enough to complete the transaction and
there are fewer things in the chain to fail.

	Since the whole point of RAID5, and especially RAID1 is
reliability, implementing them in ways that reduce reliability is very
questionable.

	Implementing RAID0 (simple striping) in software is just fine.
Simple striping actually reduces reliability, and is only done for
speed.  Implementing it in software does not significantly reduce
reliability beyond the amount it's reduced by having two drives that
could fail instead of one.

Have fun (if at all possible),
-- 
Its name is Public Opinion.  It is held in reverence. It settles everything.
Some think it is the voice of God.  Loyalty to petrified opinion never yet
broke a chain or freed a human soul.     ---Mark Twain
-- Eric Hopper (hopper@omnifarious.mn.org http://www.omnifarious.org/~hopper) --

[-- Attachment #2: Type: application/pgp-signature, Size: 232 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [linux-lvm] LVM 0.8 and reiser filesystem
  2000-06-07 12:59     ` Andi Kleen
@ 2000-06-07 15:34       ` Luca Berra
  2000-06-07 16:14         ` Andi Kleen
  2000-06-07 16:04       ` Jos Visser
  1 sibling, 1 reply; 13+ messages in thread
From: Luca Berra @ 2000-06-07 15:34 UTC (permalink / raw)
  To: linux-lvm

On Wed, Jun 07, 2000 at 02:59:54PM +0200, Andi Kleen wrote:
> In RAID5 that is very bad (e.g. when the parity block is not uptodate
> and another block is unreadable) you get silent data corruption. In
> RAID1 with a slave device you at worst get oudated data (may cause
I thought the event counter in the raid superblock was used for this
purpose.

[serious]
> Hardware RAID usually avoids the problem by using a battery backed 
> log device for atomic updates. Software Raid could do the same

-- 
Luca Berra -- bluca@comedia.it
    Communication Media & Services S.r.l.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [linux-lvm] LVM 0.8 and reiser filesystem
  2000-06-07 12:59     ` Andi Kleen
  2000-06-07 15:34       ` Luca Berra
@ 2000-06-07 16:04       ` Jos Visser
  2000-06-07 16:08         ` Andi Kleen
  1 sibling, 1 reply; 13+ messages in thread
From: Jos Visser @ 2000-06-07 16:04 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-lvm

And thus it came to pass that Andi Kleen wrote:
(on Wed, Jun 07, 2000 at 02:59:54PM +0200 to be exact)

> On Wed, Jun 07, 2000 at 02:00:43PM +0200, Luca Berra wrote:
> > On Tue, Jun 06, 2000 at 06:41:38PM +0200, Andi Kleen wrote:
> > > On a real production system you probably should not use software RAID1
> > > or RAID5 though. It is unreliable in the crash case though because
> > > it does not support data logging. In this case a hardware RAID controller
> > > is the better alternative. Of course you can run LVM on top of it.
> > I fail to get your point, what makes hw raid more reliable than sw raid?
> > why are you saying that sw raid is unreliable.
> 
> RAID1 and RAID5 require atomic update of several blocks (parity or mirror
> blocks). If the machine crashes inbetween writing such an atomic update
> it gets inconsistent.
> 
> In RAID5 that is very bad (e.g. when the parity block is not uptodate
> and another block is unreadable) you get silent data corruption. In
> RAID1 with a slave device you at worst get oudated data (may cause
> problems with journaled file systems or programs that fsync/O_SYNC
> really guarantee stable on disk storage). raidcheck can fix that in
> a lot of cases, but not in all: sometimes it cannot decide if a 
> block contains old or new data. 
> 
> Hardware RAID usually avoids the problem by using a battery backed 
> log device for atomic updates. Software Raid could do the same
> by logging block updates in a log (e.g. together with the journaled
> file system), but that is not implemented in Linux ATM. It would
> also be a severe performance hit.

The way HP's logical volume manager does it is by maintaining a kind of 
data log somewhere in the volume metadata.  This log (let's call it the 
Mirror Write Cache) is effectively a bitmap which keeps track of which 
blocks in the logical volume are hit by a write.  The unit of 
granularity here is not an individual block, but something that is 
called a Large Track Group (LTG, let's say a couple of MB).  Whenever 
all parallel writes are finished, the corresponding LTG bit in the MWC 
is cleared and the MWC on disk is (eventually) updated.

After a crash when the Volume Group is activated, all copies (plexes)
of a volume must be synchronized. The VM software inspects the MWC, and
then knows which blocks might be out of sync across the plexes. Only
these blocks are then synchronized using a read from the preferred plex
and write to all other plexes. The MWC is used to prevent a full sync
after a crash.

++Jos


-- 
The InSANE quiz master is always right!
(or was it the other way round? :-)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [linux-lvm] LVM 0.8 and reiser filesystem
  2000-06-07 16:04       ` Jos Visser
@ 2000-06-07 16:08         ` Andi Kleen
  2000-06-07 16:23           ` Jos Visser
  0 siblings, 1 reply; 13+ messages in thread
From: Andi Kleen @ 2000-06-07 16:08 UTC (permalink / raw)
  To: Jos Visser; +Cc: Andi Kleen, linux-lvm

On Wed, Jun 07, 2000 at 06:04:55PM +0200, Jos Visser wrote:
> The way HP's logical volume manager does it is by maintaining a kind of 
> data log somewhere in the volume metadata.  This log (let's call it the 
> Mirror Write Cache) is effectively a bitmap which keeps track of which 
> blocks in the logical volume are hit by a write.  The unit of 
> granularity here is not an individual block, but something that is 
> called a Large Track Group (LTG, let's say a couple of MB).  Whenever 
> all parallel writes are finished, the corresponding LTG bit in the MWC 
> is cleared and the MWC on disk is (eventually) updated.
> 
> After a crash when the Volume Group is activated, all copies (plexes)
> of a volume must be synchronized. The VM software inspects the MWC, and
> then knows which blocks might be out of sync across the plexes. Only
> these blocks are then synchronized using a read from the preferred plex
> and write to all other plexes. The MWC is used to prevent a full sync
> after a crash.

Sounds clever. I really wish Linux raid would use this optimization :-) 
(The slowness of raidcheck is a big problem) 

-Andi

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [linux-lvm] LVM 0.8 and reiser filesystem
  2000-06-07 15:34       ` Luca Berra
@ 2000-06-07 16:14         ` Andi Kleen
  0 siblings, 0 replies; 13+ messages in thread
From: Andi Kleen @ 2000-06-07 16:14 UTC (permalink / raw)
  To: linux-lvm

On Wed, Jun 07, 2000 at 05:34:20PM +0200, Luca Berra wrote:
> On Wed, Jun 07, 2000 at 02:59:54PM +0200, Andi Kleen wrote:
> > In RAID5 that is very bad (e.g. when the parity block is not uptodate
> > and another block is unreadable) you get silent data corruption. In
> > RAID1 with a slave device you at worst get oudated data (may cause
> I thought the event counter in the raid superblock was used for this
> purpose.

That would require adding a generation counter to every raid stripe
(=losing at least one block/stripe) 
and also writing of the superblock before every write.


-Andi

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [linux-lvm] LVM 0.8 and reiser filesystem
  2000-06-07 16:08         ` Andi Kleen
@ 2000-06-07 16:23           ` Jos Visser
  0 siblings, 0 replies; 13+ messages in thread
From: Jos Visser @ 2000-06-07 16:23 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-lvm

And thus it came to pass that Andi Kleen wrote:
(on Wed, Jun 07, 2000 at 06:08:03PM +0200 to be exact)

> On Wed, Jun 07, 2000 at 06:04:55PM +0200, Jos Visser wrote:
> > The way HP's logical volume manager does it is by maintaining a kind of 
> > data log somewhere in the volume metadata.  This log (let's call it the 
> > Mirror Write Cache) is effectively a bitmap which keeps track of which 
> > blocks in the logical volume are hit by a write.  The unit of 
> > granularity here is not an individual block, but something that is 
> > called a Large Track Group (LTG, let's say a couple of MB).  Whenever 
> > all parallel writes are finished, the corresponding LTG bit in the MWC 
> > is cleared and the MWC on disk is (eventually) updated.
> > 
> > After a crash when the Volume Group is activated, all copies (plexes)
> > of a volume must be synchronized. The VM software inspects the MWC, and
> > then knows which blocks might be out of sync across the plexes. Only
> > these blocks are then synchronized using a read from the preferred plex
> > and write to all other plexes. The MWC is used to prevent a full sync
> > after a crash.
> 
> Sounds clever. I really wish Linux raid would use this optimization :-) 

Well, the source is there, unfortunately I've got something to do tonight :-)

++Jos

> (The slowness of raidcheck is a big problem) 
> 
> -Andi

-- 
The InSANE quiz master is always right!
(or was it the other way round? :-)

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2000-06-07 16:23 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2000-06-06 15:44 [linux-lvm] LVM 0.8 and reiser filesystem holger_zecha
2000-06-06 16:21 ` Luca Berra
2000-06-06 16:29 ` Brian Kress
2000-06-06 16:41 ` Andi Kleen
2000-06-07 12:00   ` Luca Berra
2000-06-07 12:59     ` Andi Kleen
2000-06-07 15:34       ` Luca Berra
2000-06-07 16:14         ` Andi Kleen
2000-06-07 16:04       ` Jos Visser
2000-06-07 16:08         ` Andi Kleen
2000-06-07 16:23           ` Jos Visser
2000-06-07 13:53     ` Eric M. Hopper
2000-06-06 17:33 ` Eric M. Hopper

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).