linux-lvm.redhat.com archive mirror
 help / color / mirror / Atom feed
* Re: [linux-lvm] Re: IBM to release LVM Technology to the Linux
@ 2000-06-22 19:37 benr
  2000-06-23  1:23 ` Dale Kemp
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: benr @ 2000-06-22 19:37 UTC (permalink / raw)
  To: Martin K. Petersen; +Cc: linux-lvm



Martin,

Welcome to the discussion, and thanks for sharing your opinion.

>>>>> "Andreas" == Andreas Dilger <adilger@turbolabs.com> writes:

Andreas> Having used both the AIX LVM, the Linux LVM, and the good-old
Andreas> DOS partitions, I would have to disagree with your statement
Andreas> that logical extents are of very little benefit.  One of the
Andreas> worst things to do in a DOS-partitioned world is to resize
Andreas> the partitions themselves.  You always have to over-estimate
Andreas> the partition sizes in case you need more space in the
Andreas> future, or add a whole new partition if you run out of space
Andreas> in the existing partition.

Martin>Andreas, you are preaching to the choir.  Partitions don't belong in
Martin>an LVM architecture at all.  They are a legacy thing which needs to
go
Martin>away.

What is a partition?  By definition, a partition is a contiguous block of
disk space.

What is an extent?  By definition, an extent is a contiguous block of disk
space.

What is the difference between them?  Besides the names, nothing in and of
themselves.  What's different is how they are managed, specifically the
structures and rules used to manage them.

Martin>Also, I don't tend to agree with most of the infrastructure proposed
Martin>in the IBM whitepaper.  If IBM's intention is that this will be a
Martin>cross-OS LVM architecture, well then fine -- lots of abstractions
are
Martin>obviously needed.

Our customers have asked for a migration path to Linux from their existing
platforms.  Migration is made much easier and much less expensive when the
new platform can read and write to the volumes/partitions/etc. used by the
old platform.  Furthermore, most of our customers will require a testing
period where the new platform and the old platform are compared on the same
hardware using the same datasets.  Again, the ability of the new platform
to use the volumes/partitions/etc. of the old platform can greatly speed
the testing process while reducing costs.  Finally, an LVMS based upon the
architecture in the white paper would, with the proper plug-in modules,
ease migration from several platforms (both IBM and non-IBM) to Linux by
allowing Linux to use the volumes/partitions/etc. of these platforms.

As far as being a cross-OS LVM architecture, this is true.  This
architecture was designed with the idea of being OS neutral where possible,
and is being considered for implementation on other IBM platforms.

Martin>If it is supposed to be Linux specific, however, I don't see why one
Martin>would waste engineering resources implementing plug-ins for reading
Martin>Macintosh partitions types etc.  We already have an adequate
framework
Martin>for that in the kernel.

The MacIntosh partitioning scheme was merely an example.  With the proper
plug-ins, an LVMS based upon the architecture in the white paper would be
able to access and manipulate logical volumes (and volume groups) created
by AIX, for example.  Again, though, implementing support for other OS
partitioning schemes and logical volume management schemes aids in
migration to Linux.

Another thing to remember is that users want power without risk.  This is
especially true in the corporate world.  To make it there, Linux needs a
very powerful, flexible logical volume management system which minimizes
the risk of losing data.  This calls for an architecture which integrates
all aspects of volume/disk management into a single, easy to use entity.
All processes which could be automated should be automated to prevent
"accidents", such as the improper shrinking of a volume containing data.
Right now it is rather easy to accidentally shrink a volume before
shrinking the filesystem on the volume, or to shrink the filesystem on the
volume by the wrong amount.  Is fdisk volume group aware (have not tried
this yet)?  If it isn't, a user could make a mistake and delete a partition
which belongs to a volume group.  The current system has holes in it, and
these holes need to be plugged before Linux can be a major player in the
corporate world.  These holes can be plugged in a patch work fashion, or
they can be eliminated by adopting an architecture (not necessarily the one
in the white paper) in which they don't exist or can't occur.

Martin>The scheme I've been toying with over the past months:
Martin>
Martin> - Logical Disk = Either partition or whole disk.
Martin>
Martin> - The Logical Disk provides allocation space for extents.
Martin>
Martin> - Extents are allocated on the available logical disks based upon
Martin>   heuristics in the feature set/system administrator preferences.
Martin>
Martin> - Logical Volume consists of one or more extents accessed through
one
Martin>   or more feature sets (RAID0, RAID1, RAID5, encryption, whatever).
Martin>
Martin>The extents can be of varying size depending on the application.  A
30
Martin>GB RAID5 LV could be constructed from 4 x 10 GB extents on 4
different
Martin>physical disks + a 10 GB hot spare extent on a fifth disk, for
Martin>instance.

I don't understand your example.  Change the word extent to partition and
you have:

 A 30 GB RAID5 LV could be constructed from 4 x 10 GB partitions on 4
different
physical disks + a 10 GB hot spare partition on a fifth disk, for instance.

This is exactly what you would have under the LVMS described in the white
paper.  Your example shows nothing which could not be done through the LVMS
architecture with partitions.  Am I missing something here?


Regards,

Ben

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [linux-lvm] Re: IBM to release LVM Technology to the Linux
  2000-06-22 19:37 [linux-lvm] Re: IBM to release LVM Technology to the Linux benr
@ 2000-06-23  1:23 ` Dale Kemp
  2000-06-23 20:55 ` Martin K. Petersen
  2000-06-26 19:16 ` [linux-lvm] " Andreas Dilger
  2 siblings, 0 replies; 5+ messages in thread
From: Dale Kemp @ 2000-06-23  1:23 UTC (permalink / raw)
  To: Linux LVM mailing list

> Another thing to remember is that users want power without risk.  This is
> especially true in the corporate world.  To make it there, Linux needs a
> very powerful, flexible logical volume management system which minimizes
> the risk of losing data.  This calls for an architecture which integrates
> all aspects of volume/disk management into a single, easy to use entity.
> All processes which could be automated should be automated to prevent
> "accidents", such as the improper shrinking of a volume containing data.
> Right now it is rather easy to accidentally shrink a volume before
> shrinking the filesystem on the volume, or to shrink the filesystem on the
> volume by the wrong amount.  Is fdisk volume group aware (have not tried
> this yet)?  If it isn't, a user could make a mistake and delete a partition
> which belongs to a volume group.  The current system has holes in it, and
> these holes need to be plugged before Linux can be a major player in the
> corporate world.  These holes can be plugged in a patch work fashion, or
> they can be eliminated by adopting an architecture (not necessarily the one
> in the white paper) in which they don't exist or can't occur.

% man e2fsadm

DESCRIPTION
       e2fsadm allows resizing of a logical volume  (see  lvm(8),
       lvcreate(8))  containing  an unmounted ext2 filesystem and
       then extending the filesystem by  resize2fs(8)  afterwards
       or  reducing  the  filesystem  first and then reducing the
       logical volume afterwards.

First thing is Linux-LVM is still evolving and will only get better. Now IBM
and SGI have their own volume management systems which is fine, and
porting them to Linux can only be a good thing too. At the end of the day
its the users in the community that choose. Now its in the community and
IBM users interest for IBM to port AIX systems to Linux, so people can simply
install Linux and use there existing AIX hard drives. The same goes for SGI.
And work is already underway with JFS and XFS for example.
I actually like the system being evolved by Linux-LVM since it follows the
Unix
philosophy do one thing and to it well (the opposite of Micr$oft).

-- Dale.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [linux-lvm] Re: IBM to release LVM Technology to the Linux
  2000-06-22 19:37 [linux-lvm] Re: IBM to release LVM Technology to the Linux benr
  2000-06-23  1:23 ` Dale Kemp
@ 2000-06-23 20:55 ` Martin K. Petersen
  2000-06-26 19:16 ` [linux-lvm] " Andreas Dilger
  2 siblings, 0 replies; 5+ messages in thread
From: Martin K. Petersen @ 2000-06-23 20:55 UTC (permalink / raw)
  To: benr; +Cc: linux-lvm

>>>>> "Ben" == benr <benr@us.ibm.com> writes:

[Musings deleted]

Ben> This is exactly what you would have under the LVMS described in
Ben> the white paper.  Your example shows nothing which could not be
Ben> done through the LVMS architecture with partitions.  Am I missing
Ben> something here?

I am advocating:

1. Using existing partitioning schemes to encapsulate logical disks.

   This is only relevant in the migration case and if people want to
   test LVMS on a few disks that are already partitioned.  If your
   system has hundreds of disks connected, you'll use a 1:1 physical
   to logical disk mapping anyway.

2. Avoid using existing partitioning schemes to encapsulate logical
   partitions/extents.

-- 
Martin K. Petersen, Principal Linux Consultant, Linuxcare, Inc.
mkp@linuxcare.com, http://www.linuxcare.com/
Linuxcare. Support for the revolution.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [linux-lvm] IBM to release LVM Technology to the Linux
  2000-06-22 19:37 [linux-lvm] Re: IBM to release LVM Technology to the Linux benr
  2000-06-23  1:23 ` Dale Kemp
  2000-06-23 20:55 ` Martin K. Petersen
@ 2000-06-26 19:16 ` Andreas Dilger
  2 siblings, 0 replies; 5+ messages in thread
From: Andreas Dilger @ 2000-06-26 19:16 UTC (permalink / raw)
  To: benr; +Cc: Linux LVM mailing list

Ben writes:
> What is a partition?  By definition, a partition is a contiguous block of
> disk space.
> 
> What is an extent?  By definition, an extent is a contiguous block of disk
> space.

For the purpose of this email, I will refer to fixed-size (e.g. 4MB or 16MB)
chunks of the disk as logical blocks (LBs) and variable-sized chunks of the
disk as extents.

> What is the difference between them?  Besides the names, nothing in and of
> themselves.  What's different is how they are managed, specifically the
> structures and rules used to manage them.

I agree with this... However unless you make your LVMS infrastructure
handle many thousands of extents in an efficient manner, it will not
work well with the existing LVM implementations.  You suggest that the
Linux/AIX LVM can be abstracted by a partition plugin, but this would
could be very expensive to handle each LVM LB as an individual partition.
In practise this is not usually true, but I have seen (poorly configured,
mind you) AIX PVs that are fragmented into many tens of LVs, and each
LV is spread over 10 disks, making many hundreds of separate extents in
a single VG.

> The MacIntosh partitioning scheme was merely an example.  With the proper
> plug-ins, an LVMS based upon the architecture in the white paper would be
> able to access and manipulate logical volumes (and volume groups) created
> by AIX, for example.  Again, though, implementing support for other OS
> partitioning schemes and logical volume management schemes aids in
> migration to Linux.

I totally agree with this.  With some careful coding, the existing
Linux codebase for partition/filesystems/devices/raid can be used for
LVMS, rather than re-implementing everything and bloating the kernel.
Linux already has 90% of the functionality needed to do this - all it
requires is the framework to handle it in an abstract way like LVMS
proposes.  There will be enough people that DON'T want to use LVMS (for
whatever reason) that it should not be reworked to only work with LVMS.

> I don't understand your example.  Change the word extent to partition and
> you have:
> 
>  A 30 GB RAID5 LV could be constructed from 4 x 10 GB partitions on 4
> different
> physical disks + a 10 GB hot spare partition on a fifth disk, for instance.
> 
> This is exactly what you would have under the LVMS described in the white
> paper.  Your example shows nothing which could not be done through the LVMS
> architecture with partitions.  Am I missing something here?

The real problem is that if you want to grow your volume, you need to
create 4 new extents.  If you want to grow it again, you need 4 more
extents, etc.  What it boils down to is that you really want logical
blocks that are small enough to use disk space efficiently, and handled
efficiently enough (memory/disk maps to organize them) that you can have
many hundreds/thousands to make a single filesystem.

One of the great benefits of having small LBs, as opposed to working
with large extents, is that you can easily work with individual LBs
to move/mirror/re-sync.  If you need to do the same thing with a large
extent, it can be much more CPU/disk intensive than it needs to be, or
can lock out the user/application longer than needed.

AIX LVM has a simple bitmap which tracks stale LBs in a mirror LV (maybe
caused by the disk being unavailable for a moment).  If you resync the LV,
it only copies those parts that are stale, whereas you would need to copy
the whole partition with your monolithic LVMS.  The same could be said
for re-syncing a RAID 5 volume - you would only need to re-calculate
the parity on the LB that was stale, rather than the whole partition.
If you externally keep a bitmap of blocks for mirroring/raid/remapping
within the extent, what is the point of having extents in the first place?



This isn't to say that I don't see some benefit in using whole partitions
instead of LBs.  It could be likened to an extent-based filesystem (like XFS)
compared to a block-based filesystem like ext2.  In most cases you really
allocate the space in large chunks, so extents are good.  However, in some
cases (e.g. log files (in a filesystem), or /usr (for AIX LVM)) you tend to
allocate space in small increments, and having many separate extents.

If you can show me how an "Linux LVM" or "AIX LVM" partition plugin can
actually work in the context of LVMS, without duplicating 90% of the
LVMS functionality, and without requiring huge amounts of disk or memory
space to handle a non-contiguous LV, then I will agree that LVMS is superior
and work on its development.

Cheers, Andreas
-- 
Andreas Dilger  \ "If a man ate a pound of pasta and a pound of antipasto,
                 \  would they cancel out, leaving him still hungry?"
http://www-mddsp.enel.ucalgary.ca/People/adilger/               -- Dogbert

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [linux-lvm] IBM to release LVM Technology to the Linux
@ 2000-06-28  1:00 benr
  0 siblings, 0 replies; 5+ messages in thread
From: benr @ 2000-06-28  1:00 UTC (permalink / raw)
  To: Andreas Dilger; +Cc: linux-lvm



Andreas,

>For the purpose of this email, I will refer to fixed-size (e.g. 4MB or
16MB)
>chunks of the disk as logical blocks (LBs) and variable-sized chunks of
the
>disk as extents.

Agreed.

>With some careful coding, the existing
>Linux codebase for partition/filesystems/devices/raid can be used for
>LVMS, rather than re-implementing everything and bloating the kernel.

Yes - there is a great deal of existing code which could be
reused/repackaged.

>There will be enough people that DON'T want to use LVMS (for
>whatever reason) that it should not be reworked to only work with LVMS.

Agreed.

>One of the great benefits of having small LBs, as opposed to working
>with large extents, is that you can easily work with individual LBs
>to move/mirror/re-sync.  If you need to do the same thing with a large
>extent, it can be much more CPU/disk intensive than it needs to be, or
>can lock out the user/application longer than needed.

Small LBs are indeed easier to move, especially when compared to large
extents.  However, when it comes to mirroring and re-sync, it depends upon
your mirroring implementation.  It is rather easy for a mirroring
implementation to divide the address space being mirrored into the
equivalent of LBs, which can then be tracked as you describe with a bitmap.
True LBs are not needed for mirroring or similar items.

>AIX LVM has a simple bitmap which tracks stale LBs in a mirror LV (maybe
>caused by the disk being unavailable for a moment).  If you resync the LV,
>it only copies those parts that are stale, whereas you would need to copy
>the whole partition with your monolithic LVMS.  The same could be said
>for re-syncing a RAID 5 volume - you would only need to re-calculate
>the parity on the LB that was stale, rather than the whole partition.
>If you externally keep a bitmap of blocks for mirroring/raid/remapping
>within the extent, what is the point of having extents in the first place?

As I mentioned before, LBs can be simulated for most of those cases where
they are useful.  However, your question about why have extents at all is a
good one.  The answer, basically, is co-existence, compatibility, and
usability.  Using extents (partitions) allows us to co-exist with other
operating systems on the same machine, to share a disk with another
operating system, and to access the extents (partitions) used by other
operating systems.  The question then, is why bother with extents in
volumes?  Why not have volume groups made from extents, which are then
divided into LBs, and then the volumes constructed from the LBs?  The
answer to this is usability.  Before developing the LVMS Architecture, IBM
spent some time performing usability studies with our users.  The results
were not what we expected.  We found that users from the UNIX world were
reasonably comfortable with the standard LVM model employing volume groups.
However, there were a surprising number of users who were not.  Moving
outside of the UNIX world and into the Windoze/DOS/OS2 world, we found that
users rejected the concept of volume groups altogether.  Many never
understood what benefits volume groups were supposed to provide, and many
of those who did felt that the extra complexity of volume groups was not
worth the supposed benefits.  As a result of what we learned from these
studies, the LVMS Architecture was developed with the idea of eliminating
volume groups but providing as many of their advantages as possible, among
other things.

>If you can show me how an "Linux LVM" or "AIX LVM" partition plugin can
>actually work in the context of LVMS, without duplicating 90% of the
>LVMS functionality, and without requiring huge amounts of disk or memory
>space to handle a non-contiguous LV, then I will agree that LVMS is
superior
>and work on its development.

Well, I believe that I have only claimed that the LVMS has advantages.  I
make no claims as to it being superior.  The LVMS, like any LVM, makes
certain trade-offs.  The trade-offs made were based upon a certain set of
priorities, and not everyone has those same priorities.  Thus, beauty (and
superiority) is in the eye of the beholder.

As for how we plan to handle AIX volume groups and logical volumes, our
basic approach involves creating a set of plug-in modules.  We would have
an AIX Device Manager, an AIX Partition Manager, and one or more AIX
Feature plug-ins.

The AIX Device Manager would claim physical disks which are part of AIX
volume groups.  It would reconstruct the AIX volume groups and make each
volume group appear as a logical disk to the LVMS.  Thus, a volume group is
treated as if it was a single address space.  Each logical disk is given a
handle (32 bit) for use in identifying it.

The AIX Partition Manager would claim all logical disks that it recognizes
as AIX volume groups.  It would make each LB in the volume group appear to
the LVMS as a logical partition.  The logical partitions
created are each given a handle for use in identifying them.

The AIX Feature Plug-in would reconstruct the AIX logical volumes from the
LBs which appear as logical partitions.  At this point, each AIX logical
volume would appear as an aggregate, the topmost aggregate of an LVMS
volume.  Each logical volume has an LB table with one entry for each of the
LBs which are a part of the logical volume.  The order of entries in this
table corresponds to the order in which the LBs are used to back the
address space of the logical volume.  (I am assuming the simple, linked
case.  The LBs could be joined via software RAID as well, in which case a
different mechanism would be used with a different amount of overhead.)
Only the handle of the LB is stored in the table.  Thus, the size of the
table is (assuming 32 bit entries) 4 bytes per LB in the logical volume.
This table is then used as a hash table when converting the starting
address of an I/O request from being volume relative to being partition
(LB) relative.

The process of address translation which occurs for an I/O request against
an AIX logical volume can be summarized as follows:

The AIX Feature Plug-in translates the address from being logical volume
relative to being logical partition relative.
The AIX Partition Manager translates the address from being logical
partition relative to being logical disk relative.
The AIX Device Manager translates the address from being logical disk
relative to being device relative.

Of course, this follows the theoretical model put forth in the white paper
and does not take into account any possible optimizations.  It also assumes
the simple linked case as opposed to the software RAID case, which would be
more difficult to calculate.

How much memory does this take?  Well, the kernel component of the LVMS is
designed to be small.  As such, it only stores the data needed for
accessing the logical volumes, logical partitions, logical disks, and
devices.  Thus,  for a logical volume, it needs 4 bytes for each logical
partition it contains.  The AIX Partition Manager would need four bytes per
logical partition to store the starting address of the logical partition,
which is needed in translating the logical partition relative address into
the logical disk relative address.  How much memory the AIX Device Manager
would need to translate the logical disk relative address into the physical
disk relative address is minimal, and grows according to the number of
disks in the volume group being represented as a logical volume.  This
memory is small in comparison to the memory required by AIX Feature and AIX
Partition Manager plug-ins.  Thus, the ratio of LBs to memory that can be
managed by this system should approach 131000 LBs per MB,
unless my math is off.  At 4MB per LB, this would yield approx. 500 GB of
filesystem space per MB of memory expended to manage the LBs corresponding
to the volume underlying the filesystem.  Of course, the method presented
here is the simplest, not to avoid anything, but because it is the easiest
to explain and calculate results for.  YMMV ;-)

I hope the above description is adequate to give you an idea of what we are
thinking of when it comes to accessing and using AIX volume groups and
logical volumes.  As for avoiding duplicate functionality, I doubt that is
possible.  However, as the LVMS uses plug-in modules to do its work, kernel
bloat could be reduced by simply loading only those plug-in modules that
are actually going to be used.  In fact, it should be possible to program
the LVMS to identify and discard unused plug-in modules, unless there are
some limitations in the kernel that I am not currently aware of.

Regards,

Ben

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2000-06-28  1:00 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2000-06-22 19:37 [linux-lvm] Re: IBM to release LVM Technology to the Linux benr
2000-06-23  1:23 ` Dale Kemp
2000-06-23 20:55 ` Martin K. Petersen
2000-06-26 19:16 ` [linux-lvm] " Andreas Dilger
2000-06-28  1:00 benr

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).