linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [RFC] Multi-path IO in 2.5/2.6 ? [OFF TOPIC]
@ 2002-09-10 15:59 Ben Rafanello
  2002-09-10 16:12 ` Alan Cox
  2002-09-10 16:33 ` Christoph Hellwig
  0 siblings, 2 replies; 4+ messages in thread
From: Ben Rafanello @ 2002-09-10 15:59 UTC (permalink / raw)
  To: linux-kernel; +Cc: lmb, alan



This post is rather long, so I  apologize for that up front.

On Mon, 2002-09-09 at 12:23, Alan Cox wrote:

>Its nice clean code unlike EVMS, and doesn't duplicate half the
>kernel so its easier to hack on

It seems that nobody understands why EVMS was designed and coded
the way it was, so perhaps this is a good time to explain what
drove us in the direction we went.

EVMS was designed during the 2.3x time frame and was targeted at
the 2.4x series of kernels.  We were given some very interesting
requirements, which included:

      Support the 2.3x/2.4x series of kernels
      Support at least 1024 disks
      Support at least 1024 volumes
      Support at least 32 partitions per disk
      Support bad block relocation (potentially on every disk)
      Support drive linking
      Support existing Linux technologies (LVM1 and MD)
      Support AIX volume management on Linux
      Support OS/2 volume management on Linux
      EVMS must be expandable to support other volume managers
      Integrate all storage management functionality into a
            single, integrated system

Whether or not you agree with these requirements, these were some
of the requirements that were given to the EVMS Team.  We didn't
make them up, we just had to find ways to meet them, if possible.

In order to meet these requirements, particularly the ones
requiring support for MD, LVM1, AIX, and OS/2, we decided upon a
plug-in architecture.  We initially anticipated having at least
10 plug-in modules.  This raised problems in using the block
layer as defined in 2.3/2.4.  If we made each plug-in a device
driver for the block layer, then we would need one major number
per plug-in module, and each plug-in module would be limited in
the number of instances it could create due to the minor number
limitations.  The minor number limitation would violate the
requirement to put BBR on at least 1024 disks (unless we had
more than 1 major number per plug-in).  Furthermore, with 10
plug-ins plus EVMS itself, we did not think that we could get
one major number per plug-in.  Thus, we decided to make plug-in
modules plug into EVMS itself, instead of making each plug-in a
separate driver.  This removed all limits on the number of
plug-in modules we could have, and it removed the limit on how
many instances a plug-in module could have.  Now we only needed
one major number, the major number for EVMS itself.  (Having
only one major number for EVMS would limit the number of volumes
that EVMS could produce, which would violate the requirement for
more than 1024 volumes, but we did not think that we could get
multiple major numbers for EVMS initially.  We had hoped that
EVMS would be accepted and then, perhaps in the future, we
could get additional major numbers for EVMS.)

Next, we needed to meet the 32 partitions per disk
requirement.  While this was not a problem for IDE disks,
it was a significant problem for SCSI disks.  The current
Linux methodology for handling disks limited the number of
partitions that could appear on a disk.  We wanted to remove
that limitation.  By having EVMS handle disk partitioning
itself, and not exposing the partitions it found (thereby
eliminating the need for a minor number for each partition),
we could have as many partitions on a disk as we wanted.
If the user wanted a partition to be a volume, we could
easily do that, at which point the partition would consume
one of the EVMS minor numbers.

A potential side benefit of having the plug-ins plug in to EVMS
directly is that, in volumes where you have multiple layers in
use (such as LVM on top of MD), there may be a performance
benefit.  I say maybe because I have not measured it, so I
can't say for sure.  But, if each plug-in was a separate driver,
the transition from one layer to another in the volume (ex.
going from LVM to MD)would require the overhead of a request
queue, which would be the communication mechanism between layers.
With all of the plug-ins inside of EVMS, the transition from one
layer to another becomes just an indirect function call.  While
this would use more stack space, we calculated that a heavily
layered volume would require about 200 bytes of stack space
on IA32, a value that we felt was acceptable.

Thus, by having plug-in modules plug in to EVMS directly
instead of using the block layer, we eliminated all
limitations on EVMS that would have existed except for the
limitation on the number of volumes that EVMS could produce,
which would require additional major numbers for EVMS
before we could meet the requirement for 1024 volumes.

As for LVM and MD, after examining the LVM code we felt that
the existing LVM code would be more difficult to port to
the EVMS framework than if we wrote the LVM piece from
scratch.  With MD, on the other hand, we felt that we could
successfully port it to the EVMS framework.  Also, it seemed
that MD was held in high regard by the Community, which gave
us confidence in the code and increased our desire to port it
as opposed to rewrite it.  However, we now have two code
bases for MD.  We are currently looking for ways to get back
to only one MD code base.

I hope this clears up some things about the design and
implementation of EVMS.  Since this post is already too
long, I'll talk about Multi-path I/O and EVMS in another
one.

Regards,

Ben Rafanello
EVMS Team Lead
IBM Linux Technology Center
(512) 838-4762
benr@us.ibm.com



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC] Multi-path IO in 2.5/2.6 ? [OFF TOPIC]
  2002-09-10 15:59 [RFC] Multi-path IO in 2.5/2.6 ? [OFF TOPIC] Ben Rafanello
@ 2002-09-10 16:12 ` Alan Cox
  2002-09-10 16:33 ` Christoph Hellwig
  1 sibling, 0 replies; 4+ messages in thread
From: Alan Cox @ 2002-09-10 16:12 UTC (permalink / raw)
  To: Ben Rafanello; +Cc: linux-kernel, lmb

On Tue, 2002-09-10 at 16:59, Ben Rafanello wrote:
> It seems that nobody understands why EVMS was designed and coded
> the way it was, so perhaps this is a good time to explain what
> drove us in the direction we went.

I'm looking at things purely from a where are we now, how do we get XYZ
feature viewpoint. We have some history in the kernel and its a pain t
deal with before we import history that isnt needed from another
project.



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC] Multi-path IO in 2.5/2.6 ? [OFF TOPIC]
  2002-09-10 15:59 [RFC] Multi-path IO in 2.5/2.6 ? [OFF TOPIC] Ben Rafanello
  2002-09-10 16:12 ` Alan Cox
@ 2002-09-10 16:33 ` Christoph Hellwig
  1 sibling, 0 replies; 4+ messages in thread
From: Christoph Hellwig @ 2002-09-10 16:33 UTC (permalink / raw)
  To: Ben Rafanello; +Cc: linux-kernel, lmb, alan

>       Support at least 1024 disks
>       Support at least 1024 volumes
>       Support at least 32 partitions per disk

At least those criterias aren't archived.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC] Multi-path IO in 2.5/2.6 ? [OFF TOPIC]
@ 2002-09-10 17:27 Ben Rafanello
  0 siblings, 0 replies; 4+ messages in thread
From: Ben Rafanello @ 2002-09-10 17:27 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: alan, linux-kernel, lmb


On Tue, 2002-09-10 at 11:33, Christoph Hellwig wrote:
>>       Support at least 1024 disks
>>       Support at least 1024 volumes
>>       Support at least 32 partitions per disk
>
>At least those criterias aren't archived.

EVMS has no design limit on the number of disks
that it can support, and, the way it is coded,
the only limiting factor on the number of disks
it can support is the amount of memory available.
Any limits on the number of disks that EVMS can
use come from Linux itself, and, as such, are
beyond the control of the EVMS Team.  We are not
out to rewrite the kernel and the device drivers -
we are just trying to meet the requirements we
were given.

As for 1024 volumes, that limit is due to the
fact that EVMS has only 1 major number, as
discussed in the original post.  If EVMS were
allowed more major numbers, then this criteria
could be reached or exceeded.  However, we thought
it extremely unlikely that EVMS would be given
enough major numbers under 2.4x to reach this goal,
so we coded accordingly.  Should a miracle occur
and EVMS be given another three major numbers,
we could easily update our code to make use of
these extra major numbers and achieve 1023 volumes.

As for partitions, EVMS has no fixed limit on the
number of partitions per disk.  The limit for
partitions on a disk depends upon the size of the
disk and the disk partitioning scheme used.  I
currently run a stress test on EVMS where I create
405 partitions on a single SCSI disk.  These
partitions are then combined in various ways to
form volumes, which are then formatted and have
I/O tests performed on them.  All 405 partitions
are used.  Under EVMS, you could take any of
those partitions and turn it into a volume, which
makes it accessible. The only limit on EVMS is
the number of volumes it can create, and this is
due to EVMS having only 1 major number under 2.4x.
So not only did we meet this requirement, we greatly
exceeded this requirement.

Regards,

Ben Rafanello
EVMS Team Lead
IBM Linux Technology Center
(512) 838-4762
benr@us.ibm.com



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2002-09-10 17:23 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-09-10 15:59 [RFC] Multi-path IO in 2.5/2.6 ? [OFF TOPIC] Ben Rafanello
2002-09-10 16:12 ` Alan Cox
2002-09-10 16:33 ` Christoph Hellwig
2002-09-10 17:27 Ben Rafanello

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).