linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* A possible direction for the next LVM driver
@ 2001-08-30 15:45 Joe Thornber
  2001-08-30 15:55 ` Alan Cox
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Joe Thornber @ 2001-08-30 15:45 UTC (permalink / raw)
  To: linux-kernel

Hi,

I'm working on the next iteration of the LVM driver, specifically
trying to address the critism directed at the rather ugly ioctl
interface.  The code has reached the stage where it works and it's
possible to see what I'm aiming for.  I would appreciate it if people
could spare the time to review this and give me feedback.  If there is
general agreement that this is moving in the right direction then the
next major version of LVM may be based around a future version of this
driver.  Please CC me in replies.  The code can be found at:

ftp://ftp.sistina.com/pub/LVM2/device-mapper/device-mapper.tar.bz2


The main goal of this driver is to support volume management in
general, not just for LVM.  The kernel should provide general
services, not support specific applications.  eg, The driver has no
concept of volume groups.

The driver does this by mapping sector ranges for the logical device
onto 'targets'.

When the logical device is accessed, the make_request function looks
up the correct target for the given sector, and then asks this target
to do the remapping.

A btree structure is used to hold the sector range -> target mapping.
Since we know all the entries in the btree in advance we can make a
very compact tree, omitting pointers to child nodes, (child nodes
locations can be calculated).  Typical users would find they only have
a handful of targets for each logical volume LV.

Benchmarking with bonnie++ suggests that this is certainly no slower
than current LVM.


Target types are not hard coded, instead the register_mapping_type
function should be called.  A target type is specified using three
functions (see the header):

dm_ctr_fn - takes a string and contructs a target specific piece of
            context data.
dm_dtr_fn - destroy contexts.
dm_map_fn - function that takes a buffer_head and some previously
            constructed context and performs the remapping.

Currently there are two two trivial mappers, which are automatically
registered: 'linear', and 'io_error'.  Linear alone is enough to
implement most of LVM.


I do not like ioctl interfaces so this driver is currently controlled
through a /proc interface.  /proc/device-mapper/control allows you to
create and remove devices by 'cat'ing a line of the following format:

create <device name> [minor no]
remove <device name>

If you're not using devfs you'll have to do the mknod'ing yourself,
otherwise the device will appear in /dev/device-mapper automatically.

/proc/device-mapper/<device name> accepts the mapping table:

begin
<sector start> <length> <target name> <target args>...
...
end

where <target args> are specific to the target type, eg. for a linear
mapping:

<sector start> <length> linear <major> <minor> <start>

and the io-err mapping:

<sector start> <length> io-err

The begin/end lines around the table are nasty, they should be handled
by open/close of the file.

The interface is far from complete, currently loading a table either
succeeds or fails, you have no way of knowing which line of the
mapping table was erroneous.  Also there is no way to get status
information out, though this should be easy to add, either as another
/proc file, or just by reading the same /proc/device-mapper/<device>
file.  I will be seperating the loading and validation of a table from
the binding of a valid table to a device.

It has been suggested that I should implement a little custom
filesystem rather than labouring with /proc.  For example doing a
mkdir foo in /wherever/device-mapper would create a new device. People
waiting for a status change (eg, a mirror operation to complete) could
poll a file.  Does the community find this an acceptable way to go ?


At the moment the table assumes 32 bit keys (sectors), the move to 64
bits will involve no interface changes, since the tables will be read
in as ascii data.  A different table implementation can therefor be
provided at another time.  Either just by changing offset_t to 64
bits, or maybe implementing a structure which looks up the keys in
stages (ie, 32 bits at a time).

More interesting targets:

striped mapping; given a stripe size and a number of device regions
this would stripe data across the regions.  Especially useful, since
we could limit each striped region to a 32 bit area and then avoid
nasty 64 bit %'s.

mirror mapping; would set off a kernel thread slowly copying data from
one region to another, ensuring that any new writes got copied to both
destinations correctly.  Enabling us to implement a live pvmove
correctly.

- Joe

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: A possible direction for the next LVM driver
  2001-08-30 15:45 A possible direction for the next LVM driver Joe Thornber
@ 2001-08-30 15:55 ` Alan Cox
  2001-08-31  9:20 ` Jens Axboe
  2001-12-14 19:11 ` Alasdair G Kergon
  2 siblings, 0 replies; 6+ messages in thread
From: Alan Cox @ 2001-08-30 15:55 UTC (permalink / raw)
  To: Joe Thornber; +Cc: linux-kernel

> When the logical device is accessed, the make_request function looks
> up the correct target for the given sector, and then asks this target
> to do the remapping.

Interesting.

> A btree structure is used to hold the sector range -> target mapping.
> Since we know all the entries in the btree in advance we can make a
> very compact tree, omitting pointers to child nodes, (child nodes
> locations can be calculated).  Typical users would find they only have
> a handful of targets for each logical volume LV.
> Benchmarking with bonnie++ suggests that this is certainly no slower
> than current LVM.

Will it represent single segment filesystems as one node (ie extremely
efficiently). The reason I ask is that one thing EVMS does that I think is
right is that it lets you throw away the whole partitioning business.
Instead  DOS partition format, BSD disklabel etc are simply very limited
logical volumes


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: A possible direction for the next LVM driver
  2001-08-30 15:45 A possible direction for the next LVM driver Joe Thornber
  2001-08-30 15:55 ` Alan Cox
@ 2001-08-31  9:20 ` Jens Axboe
  2001-08-31  9:35   ` Joe Thornber
  2001-12-14 19:11 ` Alasdair G Kergon
  2 siblings, 1 reply; 6+ messages in thread
From: Jens Axboe @ 2001-08-31  9:20 UTC (permalink / raw)
  To: Joe Thornber; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 883 bytes --]

On Thu, Aug 30 2001, Joe Thornber wrote:
> Hi,
> 
> I'm working on the next iteration of the LVM driver, specifically
> trying to address the critism directed at the rather ugly ioctl
> interface.  The code has reached the stage where it works and it's
> possible to see what I'm aiming for.  I would appreciate it if people
> could spare the time to review this and give me feedback.  If there is
> general agreement that this is moving in the right direction then the
> next major version of LVM may be based around a future version of this
> driver.  Please CC me in replies.  The code can be found at:
> 
> ftp://ftp.sistina.com/pub/LVM2/device-mapper/device-mapper.tar.bz2

Looks interesting, here's patch to fix possible infinite loop in your
make_request_fn. Another quick note -- you might want to consider
slab'ifying the io_hook allocation/deallocation...

-- 
Jens Axboe


[-- Attachment #2: dm-1 --]
[-- Type: text/plain, Size: 513 bytes --]

--- dm.c~	Fri Aug 31 11:15:24 2001
+++ dm.c	Fri Aug 31 11:16:36 2001
@@ -346,7 +346,7 @@
 	int r, minor = MINOR(bh->b_rdev);
 
 	if (minor >= MAX_DEVICES)
-		return -ENXIO;
+		goto bad_rl;
 
 	rl;
 	md = _devs[minor];
@@ -359,10 +359,8 @@
 		ru;
 		r = queue_io(md, bh, rw);
 
-		if (r < 0) {
-			buffer_IO_error(bh);
-			return 0;
-
+		if (r < 0)
+			goto bad_rl;
 		} else if (r > 0)
 			return 0; /* deferred successfully */
 
@@ -377,6 +375,7 @@
 
  bad:
 	ru;
+ bad_rl:
 	buffer_IO_error(bh);
 	return 0;
 }

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: A possible direction for the next LVM driver
  2001-08-31  9:20 ` Jens Axboe
@ 2001-08-31  9:35   ` Joe Thornber
  2001-08-31  9:37     ` Jens Axboe
  0 siblings, 1 reply; 6+ messages in thread
From: Joe Thornber @ 2001-08-31  9:35 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Joe Thornber, linux-kernel

On Fri, Aug 31, 2001 at 11:20:20AM +0200, Jens Axboe wrote:
> On Thu, Aug 30 2001, Joe Thornber wrote:
> > Hi,
> > 
> > I'm working on the next iteration of the LVM driver, specifically
> > trying to address the critism directed at the rather ugly ioctl
> > interface.  The code has reached the stage where it works and it's
> > possible to see what I'm aiming for.  I would appreciate it if people
> > could spare the time to review this and give me feedback.  If there is
> > general agreement that this is moving in the right direction then the
> > next major version of LVM may be based around a future version of this
> > driver.  Please CC me in replies.  The code can be found at:
> > 
> > ftp://ftp.sistina.com/pub/LVM2/device-mapper/device-mapper.tar.bz2
> 
> Looks interesting, here's patch to fix possible infinite loop in your
> make_request_fn.

Great, thanks.

> Another quick note -- you might want to consider
> slab'ifying the io_hook allocation/deallocation...

yes, I'd thought of this, hence the comment ...

/* FIXME: These should have their own slab */
inline static struct io_hook *alloc_io_hook(void)

I'll change that now.

- Joe

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: A possible direction for the next LVM driver
  2001-08-31  9:35   ` Joe Thornber
@ 2001-08-31  9:37     ` Jens Axboe
  0 siblings, 0 replies; 6+ messages in thread
From: Jens Axboe @ 2001-08-31  9:37 UTC (permalink / raw)
  To: Joe Thornber; +Cc: linux-kernel

On Fri, Aug 31 2001, Joe Thornber wrote:
> On Fri, Aug 31, 2001 at 11:20:20AM +0200, Jens Axboe wrote:
> > On Thu, Aug 30 2001, Joe Thornber wrote:
> > > Hi,
> > > 
> > > I'm working on the next iteration of the LVM driver, specifically
> > > trying to address the critism directed at the rather ugly ioctl
> > > interface.  The code has reached the stage where it works and it's
> > > possible to see what I'm aiming for.  I would appreciate it if people
> > > could spare the time to review this and give me feedback.  If there is
> > > general agreement that this is moving in the right direction then the
> > > next major version of LVM may be based around a future version of this
> > > driver.  Please CC me in replies.  The code can be found at:
> > > 
> > > ftp://ftp.sistina.com/pub/LVM2/device-mapper/device-mapper.tar.bz2
> > 
> > Looks interesting, here's patch to fix possible infinite loop in your
> > make_request_fn.
> 
> Great, thanks.

I missed a '}', jfyi

> > Another quick note -- you might want to consider
> > slab'ifying the io_hook allocation/deallocation...
> 
> yes, I'd thought of this, hence the comment ...
> 
> /* FIXME: These should have their own slab */
> inline static struct io_hook *alloc_io_hook(void)
> 
> I'll change that now.

Oh, didn't spot that... But yet, it's a really good idea.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: A possible direction for the next LVM driver
  2001-08-30 15:45 A possible direction for the next LVM driver Joe Thornber
  2001-08-30 15:55 ` Alan Cox
  2001-08-31  9:20 ` Jens Axboe
@ 2001-12-14 19:11 ` Alasdair G Kergon
  2 siblings, 0 replies; 6+ messages in thread
From: Alasdair G Kergon @ 2001-12-14 19:11 UTC (permalink / raw)
  To: linux-kernel

On Thu, Aug 30, 2001 at 04:45:47PM +0100, Joe Thornber wrote:
> I'm working on the next iteration of the LVM driver
...which is now known as the "device-mapper" because it lets you define 
new block devices that map I/O onto sections of other block devices.

> The main goal of this driver is to support volume management in
> general, not just for LVM.  The kernel should provide general
> services, not support specific applications.  eg, The driver has no
> concept of volume groups.
 
The latest version being tested is at:
  ftp://ftp.sistina.com/pub/LVM2/device-mapper/device-mapper-0.90.02.tgz
 
The tgz file contains a CVS snapshot which includes patches against 2.4.16
and some documentation (and details for the CVS repository).

Currently there's a choice between an ioctl interface and a filesystem 
interface (dmfs).
 
Example
=======
To create a "logical volume" that concatenates /dev/sdc1 with /dev/sdd2:
[units used below are 512-byte sectors]

# cat > /tmp/lv1_table
0 1028160 linear /dev/sdc1 0
1028160 3903762 linear /dev/sdd2 0
^D
# dmsetup lv1 /tmp/lv1_table

With the filesystem interface and devfs, you could also create 
/devfs/device-mapper/lv1 by hand as follows:

# mkdir /tmp/dmfs; mount -t dmfs dmfs /tmp/dmfs
# mkdir /tmp/dmfs/lv1
# cp /tmp/lv1_table /tmp/dmfs/lv1/table

Striping is also already supported (see documentation).

Alasdair
-- 
agk@uk.sistina.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2001-12-14 19:12 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-08-30 15:45 A possible direction for the next LVM driver Joe Thornber
2001-08-30 15:55 ` Alan Cox
2001-08-31  9:20 ` Jens Axboe
2001-08-31  9:35   ` Joe Thornber
2001-08-31  9:37     ` Jens Axboe
2001-12-14 19:11 ` Alasdair G Kergon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).