All of lore.kernel.org
 help / color / mirror / Atom feed
* Improving dm-mirror as a final year project
@ 2011-01-25 15:53 Miklos Vajna
  2011-01-26 17:24 ` nishant mungse
  2011-02-01 16:12 ` Jonathan Brassow
  0 siblings, 2 replies; 14+ messages in thread
From: Miklos Vajna @ 2011-01-25 15:53 UTC (permalink / raw)
  To: dm-devel


[-- Attachment #1.1: Type: text/plain, Size: 1713 bytes --]

Hi,

I got the possibility to work on dm-mirror as a final year project at
ULX, the Hungarian distributor of Red Hat.

Get my feet wet, I created two small patches:

1) dm-mirror: allow setting ios count to 0

Always read from the default_mirror in that case.

2) dm-mirror: allow setting the default mirror

These can help in case one data leg of a mirror is a remote (iSCSI) one,
so the default RR aproach is not optimal for reading. (One may set the
ios count to 0, set the default mirror to the local one, and that will
cause a read speedup.)

I do not yet have the right to send those patches (I do this in
university time, so the copyright is not mine), but I hope to be able to
do so - to get them reviewed.

So the final year project's aim is to improve "the fault tolerance and
performance of dm-mirror". We (I and my mentors) have two ideas in that
area, not counting the above patches:

1) Make the currently hardwired RR read approach modular, that would
allow implementing for example a weighted RR algorithm. (In case one
disk is two times faster than the other one, etc.)

2) From our experiments, it seems that in case the dm-mirror looses one
of its legs and there is a write to the mirror, it gets converted to a
linear volume. It would be nice (not sure how easy) to use the mirror
log to mark the dirty blocks, so that the volume would not be converted
to a linear one: once the other leg is back, it could be updated based
on the mirror log.

The question: what do you (who have much more experience with dm-mirror
than me) think - are these reasonable goals? If not, what would you
improve/change/add/remove to the above list?

Thanks,

Miklos

PS: I'm not subscribed, please keep me in CC.

[-- Attachment #1.2: Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Improving dm-mirror as a final year project
  2011-01-25 15:53 Improving dm-mirror as a final year project Miklos Vajna
@ 2011-01-26 17:24 ` nishant mungse
  2011-01-26 20:04   ` Malahal Naineni
  2011-02-01 16:12 ` Jonathan Brassow
  1 sibling, 1 reply; 14+ messages in thread
From: nishant mungse @ 2011-01-26 17:24 UTC (permalink / raw)
  To: device-mapper development


[-- Attachment #1.1: Type: text/plain, Size: 2446 bytes --]

On Tue, Jan 25, 2011 at 10:53 AM, Miklos Vajna <vmiklos@frugalware.org>wrote:

> Hi,
>
> I got the possibility to work on dm-mirror as a final year project at
> ULX, the Hungarian distributor of Red Hat.
>
> Get my feet wet, I created two small patches:
>
> 1) dm-mirror: allow setting ios count to 0
>
> Always read from the default_mirror in that case.
>
> 2) dm-mirror: allow setting the default mirror
>
> These can help in case one data leg of a mirror is a remote (iSCSI) one,
> so the default RR aproach is not optimal for reading. (One may set the
> ios count to 0, set the default mirror to the local one, and that will
> cause a read speedup.)
>
> I do not yet have the right to send those patches (I do this in
> university time, so the copyright is not mine), but I hope to be able to
> do so - to get them reviewed.
>
> So the final year project's aim is to improve "the fault tolerance and
> performance of dm-mirror". We (I and my mentors) have two ideas in that
> area, not counting the above patches:
>
> 1) Make the currently hardwired RR read approach modular, that would
> allow implementing for example a weighted RR algorithm. (In case one
> disk is two times faster than the other one, etc.)
>
> 2) From our experiments, it seems that in case the dm-mirror looses one
> of its legs and there is a write to the mirror, it gets converted to a
> linear volume. It would be nice (not sure how easy) to use the mirror
> log to mark the dirty blocks, so that the volume would not be converted
> to a linear one: once the other leg is back, it could be updated based
> on the mirror log.
>
> The question: what do you (who have much more experience with dm-mirror
> than me) think - are these reasonable goals? If not, what would you
> improve/change/add/remove to the above list?
>
> Thanks,
>
> Miklos
>
> PS: I'm not subscribed, please keep me in CC.
>
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
>


hello all,
There is one doubt regarding the dm-raid1. As far as i know, in dm-raid1 the
data is written parallelly  on all the mirrors of mirrorset and if any of
the mirror fails to write the data then dm-mirror adds this mirror to fail
list by increasing the error count in "fail mirror" function in dm-raid1.
Actually my doubt is where this error count is decremented? i.e after kcpyd
or before and where exactly this error count is decremented?


Regards,
Nishant.

[-- Attachment #1.2: Type: text/html, Size: 3091 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Improving dm-mirror as a final year project
  2011-01-26 17:24 ` nishant mungse
@ 2011-01-26 20:04   ` Malahal Naineni
  2011-01-27  8:28     ` nishant mungse
  0 siblings, 1 reply; 14+ messages in thread
From: Malahal Naineni @ 2011-01-26 20:04 UTC (permalink / raw)
  To: dm-devel

nishant mungse [nishantmungse@gmail.com] wrote:
> 
>    hello all,
>    There is one doubt regarding the dm-raid1. As far as i know, in dm-raid1
>    the data is written parallelly� on all the mirrors of mirrorset and if any
>    of the mirror fails to write the data then dm-mirror adds this mirror to
>    fail list by increasing the error count in "fail mirror" function in
>    dm-raid1.
>    Actually my doubt is where this error count is decremented? i.e after
>    kcpyd or before and where exactly this error count is decremented?

There is no actual 'fail list' of mirror legs. The error_count is never
decremented. The 'error_count' is only used to tell if the mirror leg
has encountered errors. The only way to clear that counter is
re-loading the table!

See the comment:

        /*
         * error_count is used for nothing more than a
         * simple way to tell if a device has encountered
         * errors.
         */
        atomic_inc(&m->error_count);

Hope that helps,
Malahal.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Improving dm-mirror as a final year project
  2011-01-26 20:04   ` Malahal Naineni
@ 2011-01-27  8:28     ` nishant mungse
  0 siblings, 0 replies; 14+ messages in thread
From: nishant mungse @ 2011-01-27  8:28 UTC (permalink / raw)
  To: device-mapper development


[-- Attachment #1.1: Type: text/plain, Size: 1615 bytes --]

On Wed, Jan 26, 2011 at 3:04 PM, Malahal Naineni <malahal@us.ibm.com> wrote:

> nishant mungse [nishantmungse@gmail.com] wrote:
> >
> >    hello all,
> >    There is one doubt regarding the dm-raid1. As far as i know, in
> dm-raid1
> >    the data is written parallelly� on all the mirrors of mirrorset and if
> any
> >    of the mirror fails to write the data then dm-mirror adds this mirror
> to
> >    fail list by increasing the error count in "fail mirror" function in
> >    dm-raid1.
> >    Actually my doubt is where this error count is decremented? i.e after
> >    kcpyd or before and where exactly this error count is decremented?
>
> There is no actual 'fail list' of mirror legs. The error_count is never
> decremented. The 'error_count' is only used to tell if the mirror leg
> has encountered errors. The only way to clear that counter is
> re-loading the table!
>
> See the comment:
>
>        /*
>         * error_count is used for nothing more than a
>         * simple way to tell if a device has encountered
>         * errors.
>         */
>        atomic_inc(&m->error_count);
>
> Hope that helps,
> Malahal.
>
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
>

Hi Malahal

Thanks for reply. I am not able to understand what  this
"INIT_WORK(&ms->kmirrord_work, do_mirror)" function does. It  is written in
mirror_ctr of dm-raid1. What i think is whenever a bio comes dm_mirror will
be called if that is case then why this is written in mirror_ctr. Please
answer this.

Regards,
Nishant.

[-- Attachment #1.2: Type: text/html, Size: 2359 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Improving dm-mirror as a final year project
  2011-01-25 15:53 Improving dm-mirror as a final year project Miklos Vajna
  2011-01-26 17:24 ` nishant mungse
@ 2011-02-01 16:12 ` Jonathan Brassow
  2011-02-09 17:13   ` Miklos Vajna
  2011-02-14 17:33   ` Miklos Vajna
  1 sibling, 2 replies; 14+ messages in thread
From: Jonathan Brassow @ 2011-02-01 16:12 UTC (permalink / raw)
  To: device-mapper development

On Jan 25, 2011, at 9:53 AM, Miklos Vajna wrote:

> Hi,
>
> I got the possibility to work on dm-mirror as a final year project at
> ULX, the Hungarian distributor of Red Hat.
>
> Get my feet wet, I created two small patches:
>
> 1) dm-mirror: allow setting ios count to 0
>
> Always read from the default_mirror in that case.
>
> 2) dm-mirror: allow setting the default mirror
>
> These can help in case one data leg of a mirror is a remote (iSCSI)  
> one,
> so the default RR aproach is not optimal for reading. (One may set the
> ios count to 0, set the default mirror to the local one, and that will
> cause a read speedup.)
>
> I do not yet have the right to send those patches (I do this in
> university time, so the copyright is not mine), but I hope to be  
> able to
> do so - to get them reviewed.
>
> So the final year project's aim is to improve "the fault tolerance and
> performance of dm-mirror". We (I and my mentors) have two ideas in  
> that
> area, not counting the above patches:
>
> 1) Make the currently hardwired RR read approach modular, that would
> allow implementing for example a weighted RR algorithm. (In case one
> disk is two times faster than the other one, etc.)
>
> 2) From our experiments, it seems that in case the dm-mirror looses  
> one
> of its legs and there is a write to the mirror, it gets converted to a
> linear volume. It would be nice (not sure how easy) to use the mirror
> log to mark the dirty blocks, so that the volume would not be  
> converted
> to a linear one: once the other leg is back, it could be updated based
> on the mirror log.
>
> The question: what do you (who have much more experience with dm- 
> mirror
> than me) think - are these reasonable goals? If not, what would you
> improve/change/add/remove to the above list?

Would you consider working on the recently added dm-raid.c?  It is an  
attempt to access the MD personalities through device-mapper.  As  
such, RAID456 (the initial drop) will be available - in addition to  
RAID1.  There is much to be done in this area - large projects as well  
as small - encompassing performance issues, metadata layout, testing,  
etc...  There is also a lot of attention being paid to this area.

I think device-mapper mirroring will be used for a while, but it will  
likely become deprecated.

  brassow

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Improving dm-mirror as a final year project
  2011-02-01 16:12 ` Jonathan Brassow
@ 2011-02-09 17:13   ` Miklos Vajna
  2011-02-14 17:33   ` Miklos Vajna
  1 sibling, 0 replies; 14+ messages in thread
From: Miklos Vajna @ 2011-02-09 17:13 UTC (permalink / raw)
  To: device-mapper development

Hi Jonathan,

On Tue, Feb 01, 2011 at 10:12:36AM -0600, Jonathan Brassow <jbrassow@redhat.com> wrote:
> Would you consider working on the recently added dm-raid.c?  It is an  
> attempt to access the MD personalities through device-mapper.  As  
> such, RAID456 (the initial drop) will be available - in addition to  
> RAID1.  There is much to be done in this area - large projects as well  
> as small - encompassing performance issues, metadata layout, testing,  
> etc...  There is also a lot of attention being paid to this area.
> 
> I think device-mapper mirroring will be used for a while, but it will  
> likely become deprecated.

Sure, that sounds interesting. The first part of my MSc thesis should be
done till 2011.05.13 (14 weeks), I have about 2 workdays / week for the
project. Of course the actual development time is a bit less, since I
have to produce paper from this (about 30 pages for the first part).

(The second part should be done in Q3-Q4, when I'll have a bit more
dedicated time.)

Do you have a project idea for such a timeframe?

My mentors are asked: is dm-raid supported in a RHEL version?

As far as I see it was added in commit
9d09e663d5502c46f2d9481c04c1087e1c2da698 after 2.6.37, so it's not part
of RHEL (I created my dm-mirror patches on RHEL5) - and it's a pro for
them if I can do my development based on a RHEL kernel, what they use.

So - given such a timeframe, is it better to develop dm-mirror, which is
fairly stable or is it possible to develop dm-raid, even if it's quite
new?

Thanks,

Miklos

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Improving dm-mirror as a final year project
  2011-02-01 16:12 ` Jonathan Brassow
  2011-02-09 17:13   ` Miklos Vajna
@ 2011-02-14 17:33   ` Miklos Vajna
  2011-02-14 21:31     ` Jonathan Brassow
  1 sibling, 1 reply; 14+ messages in thread
From: Miklos Vajna @ 2011-02-14 17:33 UTC (permalink / raw)
  To: device-mapper development

[-- Attachment #1: Type: text/plain, Size: 518 bytes --]

On Tue, Feb 01, 2011 at 10:12:36AM -0600, Jonathan Brassow <jbrassow@redhat.com> wrote:
> > I do not yet have the right to send those patches (I do this in
> > university time, so the copyright is not mine), but I hope to be  
> > able to
> > do so - to get them reviewed.

Hi,

I'm attaching the two patches. I tested these on RHEL5, but I don't
think there are major changes in newer versions, either.

If that's the only problem, I can test it on latest linux-2.6.git.

Any feedback is welcome. :)

Thanks,

Miklos

[-- Attachment #2: 0001-dm-mirror-allow-setting-ios-count-to-0.patch --]
[-- Type: text/plain, Size: 1473 bytes --]

From 2a5aaeef1caed0019abdce92fdb6e6a4413677f8 Mon Sep 17 00:00:00 2001
From: Miklos Vajna <vmiklos@ulx.hu>
Date: Sat, 25 Sep 2010 14:56:28 +0200
Subject: [PATCH] dm-mirror: allow setting ios count to 0

Always read from the default_mirror in that case. This can help in case
the default_mirror is way faster than the other data leg.

Signed-off-by: Miklos Vajna <vmiklos@ulx.hu>
---
 drivers/md/dm-raid1.c |   15 +++++++++++++--
 1 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/md/dm-raid1.c b/drivers/md/dm-raid1.c
index 23c1d65..318899e 100644
--- a/drivers/md/dm-raid1.c
+++ b/drivers/md/dm-raid1.c
@@ -849,6 +849,15 @@ static struct mirror *choose_mirror(struct mirror_set *ms)
 	 * the first we tried, so we know when we're done.
 	 */
 	ret = start_mirror = ms->read_mirror;
+
+	/*
+	 * If MIN_READS is zero, then always use the default one.
+	 */
+	if (!atomic_read(&ms->rr_ios_set)) {
+		ret = ms->default_mirror;
+		goto use_mirror;
+	}
+
 	do {
 		if (likely(!atomic_read(&ret->error_count) &&
 			   !atomic_dec_and_test(&ms->rr_ios)))
@@ -1848,8 +1857,10 @@ static int mirror_message(struct dm_target *ti, unsigned argc, char **argv)
 
 	if (sscanf(argv[3], "%u", &rr_ios_set) != 1 ||
 	    rr_ios_set < 2) {
-		DMERR("Round robin read ios have to be > 1");
-		return -EINVAL;
+		if (rr_ios_set != 0) {
+			DMERR("Round robin read ios have to be > 1 or 0");
+			return -EINVAL;
+		}
 	}
 
 	md = dm_table_get_md(ti->table);
-- 
1.5.5.6


[-- Attachment #3: 0002-dm-mirror-allow-setting-the-default-mirror.patch --]
[-- Type: text/plain, Size: 2712 bytes --]

From 02c399b2171dec66d7c014caa1efbc58c0ba8ee1 Mon Sep 17 00:00:00 2001
From: Miklos Vajna <vmiklos@ulx.hu>
Date: Sat, 25 Sep 2010 17:16:11 +0200
Subject: [PATCH] dm-mirror: allow setting the default mirror

This can help in case an other data leg is faster than the default one
and the ios count is set to zero.

Signed-off-by: Miklos Vajna <vmiklos@ulx.hu>
---
 drivers/md/dm-raid1.c |   54 +++++++++++++++++++++++++++++++-----------------
 1 files changed, 35 insertions(+), 19 deletions(-)

diff --git a/drivers/md/dm-raid1.c b/drivers/md/dm-raid1.c
index 318899e..b746b0c 100644
--- a/drivers/md/dm-raid1.c
+++ b/drivers/md/dm-raid1.c
@@ -1845,31 +1845,47 @@ static void mirror_resume(struct dm_target *ti)
 /* Set round robin ios via message. */
 static int mirror_message(struct dm_target *ti, unsigned argc, char **argv)
 {
-	unsigned rr_ios_set;
 	struct mirror_set *ms = ti->private;
 	struct mapped_device *md;
 
-	if (argc != 4 ||
-	    strncmp(argv[0], "io_balance", strlen(argv[0])) ||
-	    strncmp(argv[1], "round_robin", strlen(argv[1])) ||
-	    strncmp(argv[2], "ios", strlen(argv[2])))
-		return -EINVAL;
-
-	if (sscanf(argv[3], "%u", &rr_ios_set) != 1 ||
-	    rr_ios_set < 2) {
-		if (rr_ios_set != 0) {
-			DMERR("Round robin read ios have to be > 1 or 0");
-			return -EINVAL;
+	if (argc == 4 &&
+	    !strncmp(argv[0], "io_balance", strlen(argv[0])) &&
+	    !strncmp(argv[1], "round_robin", strlen(argv[1])) &&
+	    !strncmp(argv[2], "ios", strlen(argv[2]))) {
+		unsigned rr_ios_set;
+		if (sscanf(argv[3], "%u", &rr_ios_set) != 1 ||
+				rr_ios_set < 2) {
+			if (rr_ios_set != 0) {
+				DMERR("Round robin read ios have to be > 1 or 0");
+				return -EINVAL;
+			}
 		}
+
+		md = dm_table_get_md(ti->table);
+		DMINFO("Setting round robin read ios for \"%s\" to %u",
+				dm_device_name(md), rr_ios_set);
+		dm_put(md);
+		atomic_set(&ms->rr_ios_set, rr_ios_set);
+		atomic_set(&ms->rr_ios, rr_ios_set);
+		return 0;
 	}
 
-	md = dm_table_get_md(ti->table);
-	DMINFO("Setting round robin read ios for \"%s\" to %u",
-	        dm_device_name(md), rr_ios_set);
-	dm_put(md);
-	atomic_set(&ms->rr_ios_set, rr_ios_set);
-	atomic_set(&ms->rr_ios, rr_ios_set);
-	return 0;
+	if (argc == 3 &&
+		!strncmp(argv[0], "io_balance", strlen(argv[0])) &&
+		!strncmp(argv[1], "default", strlen(argv[1]))) {
+		unsigned int m;
+		for (m = 0; m < ms->nr_mirrors; m++)
+			if (!strncmp(argv[2], ms->mirror[m].dev->name, strlen(argv[2]))) {
+				ms->default_mirror = &ms->mirror[m];
+				md = dm_table_get_md(ti->table);
+				DMINFO("Setting default device for \"%s\" to \"%s\"",
+						dm_device_name(md), argv[2]);
+				dm_put(md);
+				return 0;
+			}
+	}
+
+	return -EINVAL;
 }
 
 /*
-- 
1.5.5.6


[-- Attachment #4: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: Improving dm-mirror as a final year project
  2011-02-14 17:33   ` Miklos Vajna
@ 2011-02-14 21:31     ` Jonathan Brassow
  2011-02-15 12:52       ` Miklos Vajna
  0 siblings, 1 reply; 14+ messages in thread
From: Jonathan Brassow @ 2011-02-14 21:31 UTC (permalink / raw)
  To: device-mapper development


On Feb 14, 2011, at 11:33 AM, Miklos Vajna wrote:

> On Tue, Feb 01, 2011 at 10:12:36AM -0600, Jonathan Brassow <jbrassow@redhat.com 
> > wrote:
>>> I do not yet have the right to send those patches (I do this in
>>> university time, so the copyright is not mine), but I hope to be
>>> able to
>>> do so - to get them reviewed.
>
> Hi,
>
> I'm attaching the two patches. I tested these on RHEL5, but I don't
> think there are major changes in newer versions, either.
>
> If that's the only problem, I can test it on latest linux-2.6.git.
>
> Any feedback is welcome. :)

Thanks for the patches.  I've seen the first one before (slightly  
different) - I'll discuss it with others whether to include in rhel5.   
There is no read-balancing in rhel6/upstream.

The second patch addresses which device should be primary.  This can  
be done when creating the mirror.  I'm not sure how much benefit there  
is to doing this additional step.  Most people will access dm mirrors  
through LVM - not through the dm message interface.  If it makes sense  
upstream - and you can argue for it - though, I would consider it.

Wether changes are going into rhel5 or rhel6, we still like it when  
they go upstream first.  We generally don't like feature inversion.

If you have any interest in dm-raid, these are some of the things that  
need to be done:
1) Definition of new MD superblock:  Some of this is started, and I've  
got a working version, but I'm sure there are pieces missing related  
to offsets that must be tracked for RAID type conversion, etc.
2) Bitmap work:  The bitmap keeps track of which areas of the array  
are being written.  Right now, I take all the bitmap code "as-is".   
There are a number of things in this area to be improved.  Firstly, we  
don't necessarily need all the fields in the bitmap superblock -  
perhaps this could be streamlined and added to the new MD superblock.   
Secondly, things are way too slow.  I get a 10x slowdown when using a  
bitmap with RAID1 through device-mapper.  This could be due to  the  
region-size chosen, the bitmap being at a stupid offset, or something  
else.  This problem could be solved by trial-and-error or through  
profiling and reason... seems like a great small project.
3) Conversion code:  New device-mapper targets (very simple small  
ones) must be written to engage the MD RAID conversion code (like when  
you change RAID4 to RAID5, for example)
4) Failure testing
5) LVM code: to handle creation of RAID devices
6) dmeventd code: to handle device failures

This is an incomplete list, but does have a variety of complexity and  
duration.

  brassow

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Improving dm-mirror as a final year project
  2011-02-14 21:31     ` Jonathan Brassow
@ 2011-02-15 12:52       ` Miklos Vajna
  2011-02-15 22:13         ` Jonathan Brassow
  0 siblings, 1 reply; 14+ messages in thread
From: Miklos Vajna @ 2011-02-15 12:52 UTC (permalink / raw)
  To: device-mapper development

On Mon, Feb 14, 2011 at 03:31:00PM -0600, Jonathan Brassow <jbrassow@redhat.com> wrote:
> Thanks for the patches.  I've seen the first one before (slightly  
> different) - I'll discuss it with others whether to include in rhel5.   
> There is no read-balancing in rhel6/upstream.

Oh, do I read the code correctly that rhel6/upstream always reads from
the first mirror and switches only in case there is a read failure?

> The second patch addresses which device should be primary.  This can  
> be done when creating the mirror.  I'm not sure how much benefit there  
> is to doing this additional step.  Most people will access dm mirrors  
> through LVM - not through the dm message interface.  If it makes sense  
> upstream - and you can argue for it - though, I would consider it.

Is there such a messaging interface for lvm as well? I choosed this way
as in this case I did not have to alter the metadata.

One useful use case I can imagine is when both data legs of the mirror
are provided by iscsi and the administrator does not realise what is the
faster leg and her bad decision is found out only after there is some
data on the mirror.

My patch allows one to just set the first mirror in that case, without
saving data, recreating the mirror and restoring data. (Unless I missed
some other neat trick on how to do so.)

> Wether changes are going into rhel5 or rhel6, we still like it when  
> they go upstream first.  We generally don't like feature inversion.

Sure - I was not aware at all that the round robin part of the code is
RHEL5-specific.

> If you have any interest in dm-raid, these are some of the things that  
> need to be done:

Thanks for the list - I must admit that some of the points are Chinese
to me; I'm not that familiar with the codebase, just with the basic LVM
commands anc concepts.

> 1) Definition of new MD superblock:  Some of this is started, and I've  
> got a working version, but I'm sure there are pieces missing related  
> to offsets that must be tracked for RAID type conversion, etc.
> 2) Bitmap work:  The bitmap keeps track of which areas of the array  
> are being written.  Right now, I take all the bitmap code "as-is".   
> There are a number of things in this area to be improved.  Firstly, we  
> don't necessarily need all the fields in the bitmap superblock -  
> perhaps this could be streamlined and added to the new MD superblock.   
> Secondly, things are way too slow.  I get a 10x slowdown when using a  
> bitmap with RAID1 through device-mapper.  This could be due to  the  
> region-size chosen, the bitmap being at a stupid offset, or something  
> else.  This problem could be solved by trial-and-error or through  
> profiling and reason... seems like a great small project.
> 3) Conversion code:  New device-mapper targets (very simple small  
> ones) must be written to engage the MD RAID conversion code (like when  
> you change RAID4 to RAID5, for example)
> 4) Failure testing
> 5) LVM code: to handle creation of RAID devices
> 6) dmeventd code: to handle device failures

Before choosing from this list:

I first have to evaluate the current status of dm-raid so that we can
decide with my mentors if the topic of my thesis should be dm-mirror or
dm-raid (ie. if dm-raid is mature enough that I can write a thesis about
it). Where is the newest version of dm-raid.c? I saw the upstream kernel
has a single commit from this January, but I guess the rawhide / rhel
kernel contained this earlier - maybe there is a newer version than
upstream somewhere?

Also, is there any documentation on dm-raid? Google found
http://www.linux-archive.org/device-mapper-development/454656-dm-raid-wrapper-target-md-raid456.html
but maybe there is now a better way to create raid4 than using
gime_raid.pl?

And a last question: is support for raid1 a planned feature? I think
that would be interesting as well. (If dm-raid is going to replace
dm-mirror in the long run.)

Thanks!

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Improving dm-mirror as a final year project
  2011-02-15 12:52       ` Miklos Vajna
@ 2011-02-15 22:13         ` Jonathan Brassow
  2011-02-16 17:12           ` Miklos Vajna
  0 siblings, 1 reply; 14+ messages in thread
From: Jonathan Brassow @ 2011-02-15 22:13 UTC (permalink / raw)
  To: device-mapper development

[-- Attachment #1: Type: text/plain, Size: 5075 bytes --]


On Feb 15, 2011, at 6:52 AM, Miklos Vajna wrote:

> On Mon, Feb 14, 2011 at 03:31:00PM -0600, Jonathan Brassow <jbrassow@redhat.com 
> > wrote:
>> Thanks for the patches.  I've seen the first one before (slightly
>> different) - I'll discuss it with others whether to include in rhel5.
>> There is no read-balancing in rhel6/upstream.
>
> Oh, do I read the code correctly that rhel6/upstream always reads from
> the first mirror and switches only in case there is a read failure?

yes

>
>> The second patch addresses which device should be primary.  This can
>> be done when creating the mirror.  I'm not sure how much benefit  
>> there
>> is to doing this additional step.  Most people will access dm mirrors
>> through LVM - not through the dm message interface.  If it makes  
>> sense
>> upstream - and you can argue for it - though, I would consider it.
>
> Is there such a messaging interface for lvm as well? I choosed this  
> way
> as in this case I did not have to alter the metadata.

There is no msg interface via LVM (although LVM could use the message  
interface for some things.)

> One useful use case I can imagine is when both data legs of the mirror
> are provided by iscsi and the administrator does not realise what is  
> the
> faster leg and her bad decision is found out only after there is some
> data on the mirror.

Perhaps, but if you don't encode this in the LVM metadata, you will  
have to perform the action every time you reboot.  Instead, you could  
reorder the devices in userspace and reload the table.

> My patch allows one to just set the first mirror in that case, without
> saving data, recreating the mirror and restoring data. (Unless I  
> missed
> some other neat trick on how to do so.)
>
>> Wether changes are going into rhel5 or rhel6, we still like it when
>> they go upstream first.  We generally don't like feature inversion.
>
> Sure - I was not aware at all that the round robin part of the code is
> RHEL5-specific.
>
>> If you have any interest in dm-raid, these are some of the things  
>> that
>> need to be done:
>
> Thanks for the list - I must admit that some of the points are Chinese
> to me; I'm not that familiar with the codebase, just with the basic  
> LVM
> commands anc concepts.
>
>> 1) Definition of new MD superblock:  Some of this is started, and  
>> I've
>> got a working version, but I'm sure there are pieces missing related
>> to offsets that must be tracked for RAID type conversion, etc.
>> 2) Bitmap work:  The bitmap keeps track of which areas of the array
>> are being written.  Right now, I take all the bitmap code "as-is".
>> There are a number of things in this area to be improved.  Firstly,  
>> we
>> don't necessarily need all the fields in the bitmap superblock -
>> perhaps this could be streamlined and added to the new MD superblock.
>> Secondly, things are way too slow.  I get a 10x slowdown when using a
>> bitmap with RAID1 through device-mapper.  This could be due to  the
>> region-size chosen, the bitmap being at a stupid offset, or something
>> else.  This problem could be solved by trial-and-error or through
>> profiling and reason... seems like a great small project.
>> 3) Conversion code:  New device-mapper targets (very simple small
>> ones) must be written to engage the MD RAID conversion code (like  
>> when
>> you change RAID4 to RAID5, for example)
>> 4) Failure testing
>> 5) LVM code: to handle creation of RAID devices
>> 6) dmeventd code: to handle device failures
>
> Before choosing from this list:
>
> I first have to evaluate the current status of dm-raid so that we can
> decide with my mentors if the topic of my thesis should be dm-mirror  
> or
> dm-raid (ie. if dm-raid is mature enough that I can write a thesis  
> about
> it). Where is the newest version of dm-raid.c? I saw the upstream  
> kernel
> has a single commit from this January, but I guess the rawhide / rhel
> kernel contained this earlier - maybe there is a newer version than
> upstream somewhere?

The basic component that covers RAID456 is available upstream, as you  
saw.  I have an additional set of ~12 (reasonably small) patches that  
add RAID1 and superblock/bitmap support.  These patches are not yet  
upstream nor are they in any RHEL product.

> Also, is there any documentation on dm-raid? Google found
> http://www.linux-archive.org/device-mapper-development/454656-dm-raid-wrapper-target-md-raid456.html
> but maybe there is now a better way to create raid4 than using
> gime_raid.pl?

Yes, I have a script called 'gime_raid.pl' that creates the device- 
mapper tables for dm-raid.  Eventually, this will be pushed into LVM,  
but it was much easier (for testing purposes) to start with a perl  
script.

> And a last question: is support for raid1 a planned feature? I think
> that would be interesting as well. (If dm-raid is going to replace
> dm-mirror in the long run.)

Yes, RAID1 is planned; and works to a large extent.

  brassow

For convenience, I've attached the patches I'm working on (quilt  
directory) and the latest gime_raid.pl script.


[-- Attachment #2: dm-raid-patches.tgz --]
[-- Type: application/octet-stream, Size: 14456 bytes --]

[-- Attachment #3: Type: text/plain, Size: 4 bytes --]






[-- Attachment #4: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Improving dm-mirror as a final year project
  2011-02-15 22:13         ` Jonathan Brassow
@ 2011-02-16 17:12           ` Miklos Vajna
  2011-02-17 10:49             ` Miklos Vajna
  0 siblings, 1 reply; 14+ messages in thread
From: Miklos Vajna @ 2011-02-16 17:12 UTC (permalink / raw)
  To: device-mapper development

On Tue, Feb 15, 2011 at 04:13:15PM -0600, Jonathan Brassow <jbrassow@redhat.com> wrote:
> > Oh, do I read the code correctly that rhel6/upstream always reads from
> > the first mirror and switches only in case there is a read failure?
> 
> yes

Hm, was the was reason for dropping the round robin feature? I thought
that round robin causes better performance, given that IO reads are done
in an async way. Did I miss something or just nobody ported the patch to
RHEL6/upstream?

> Perhaps, but if you don't encode this in the LVM metadata, you will  
> have to perform the action every time you reboot.  Instead, you could  
> reorder the devices in userspace and reload the table.

I was not aware such a reorder is possible. That patch then does not
make too much sense, agreed. ;) (Do you have a pointer to some
documentation on how that reorder can be done? I can't find anything
about reorder in the lvconvert/dmsetup manpage.

> The basic component that covers RAID456 is available upstream, as you  
> saw.  I have an additional set of ~12 (reasonably small) patches that  
> add RAID1 and superblock/bitmap support.  These patches are not yet  
> upstream nor are they in any RHEL product.

Then what is the recommended platform to hack dm-raid? I have RHEL6 at
the moment. Is it OK to try to cherry-pick the single commit from
upstream + apply your patches or is it better to install rawhide where
the kernel is already 2.6.38rc5 (as far as I see) and only apply your
patches there?

> Yes, I have a script called 'gime_raid.pl' that creates the device- 
> mapper tables for dm-raid.  Eventually, this will be pushed into LVM,  
> but it was much easier (for testing purposes) to start with a perl  
> script.

Sure. :)

> For convenience, I've attached the patches I'm working on (quilt  
> directory) and the latest gime_raid.pl script.

Thanks!

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Improving dm-mirror as a final year project
  2011-02-16 17:12           ` Miklos Vajna
@ 2011-02-17 10:49             ` Miklos Vajna
  2011-02-17 21:02               ` Jonathan Brassow
  0 siblings, 1 reply; 14+ messages in thread
From: Miklos Vajna @ 2011-02-17 10:49 UTC (permalink / raw)
  To: device-mapper development

On Wed, Feb 16, 2011 at 06:12:19PM +0100, Miklos Vajna <vmiklos@ulx.hu> wrote:
> > The basic component that covers RAID456 is available upstream, as you  
> > saw.  I have an additional set of ~12 (reasonably small) patches that  
> > add RAID1 and superblock/bitmap support.  These patches are not yet  
> > upstream nor are they in any RHEL product.
> 
> Then what is the recommended platform to hack dm-raid? I have RHEL6 at
> the moment. Is it OK to try to cherry-pick the single commit from
> upstream + apply your patches or is it better to install rawhide where
> the kernel is already 2.6.38rc5 (as far as I see) and only apply your
> patches there?
>
> (...) 
> 
> > For convenience, I've attached the patches I'm working on (quilt  
> > directory) and the latest gime_raid.pl script.

I tried these on Fedora 15 with mixed results.

I prepared a kernel source tree:

----
$ wget http://download.fedora.redhat.com/pub/fedora/linux/development/15/source/SRPMS/kernel-2.6.38-0.rc4.git0.2.fc15.src.rpm
$ rpm -Uvh kernel-2.6.38-0.rc4.git0.2.fc15.src.rpm 2>&1 | grep -v mockb
$ cd ~/rpmbuild/SPECS
# yum install xmlto asciidoc elfutils-devel 'perl(ExtUtils::Embed)' # build-depends for kernel
$ rpmbuild -bp --target=`uname -m` kernel.spec 2>&1 | tee prep.log
----

Then continue the build manually:

----
$ cd ~/rpmbuild/BUILD/kernel-2.6.37.fc15/linux-2.6.37.i686
$ perl -p -i -e "s/^EXTRAVERSION.*/EXTRAVERSION = -0.rc4.git7.1.fc15.i686.PAE/" Makefile

$ cp /usr/src/kernels/2.6.38-0.rc4.git7.1.fc15.i686.PAE/Module.symvers .

$ make prepare
$ make modules_prepare
$ make M=drivers/md
# cp drivers/md/*.ko /lib/modules/2.6.38-0.rc4.git7.1.fc15.i686.PAE/extra/
# depmod -a
----

Apply the patches:

----
$ git quiltimport --author "Jonathan Brassow <jbrassow@redhat.com>" --patches ~/dm-raid-patches/
----

Build, install and reload the modules, then try it:

----
# ~vmiklos/dm-raid-patches/gime_raid.pl raid4 /dev/sd[bcdef]1
RAID type    : raid4
Block devices: /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1
Device size  : 770868BB
# ls /dev/mapper/
control  raid  vg_diploma1f-lv_root  vg_diploma1f-lv_swap
----

No hanging! (It did before I applied your patches.)

However almost anything else fails. Here is what I tried:

(I reverted the virtual machine to a snapshot before each attempt.)

Creating a raid1:

----
# ~vmiklos/dm-raid-patches/gime_raid.pl raid1 /dev/sd[bc]1
RAID type    : raid1
Block devices: /dev/sdb1 /dev/sdc1
Device size  : 192717BB

Message from syslogd@diploma1-f at Feb 17 11:19:57 ...
 kernel:[ 1395.143044] Oops: 0000 [#1] SMP

(...)

sh: line 1:  4209 Done                    echo 0 192717 raid raid1 0 2 - /dev/sdb1 - /dev/sdc1
      4210 Killed                  | dmsetup create raid
Failed to create "raid":
  0 192717 raid raid1 0 2 - /dev/sdb1 - /dev/sdc1
----

dmesg:

----
[ 1269.709043] bio: create slab <bio-1> at 1
[ 1269.733414] md/raid1:mdX: not clean -- starting background reconstruction
[ 1269.734568] md/raid1:mdX: active with 2 out of 2 mirrors
[ 1269.752416] BUG: unable to handle kernel NULL pointer dereference at 000001f4
[ 1269.753039] IP: [<c070c3d0>] md_integrity_register+0x29/0xdc
[ 1269.753039] *pdpt = 000000000c77c001 *pde = 000000000f5be067 *pte = 0000000000000000 
[ 1269.753039] Oops: 0000 [#1] SMP 
[ 1269.753039] last sysfs file: /sys/module/raid1/initstate
[ 1269.753039] Modules linked in: dm_raid raid1 raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx sunrpc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables snd_ens1371 gameport snd_rawmidi snd_ac97_codec ac97_bus snd_seq snd_seq_device snd_pcm snd_timer ppdev vmw_balloon snd pcnet32 mii soundcore snd_page_alloc i2c_piix4 parport_pc i2c_core parport ipv6 mptspi mptscsih mptbase scsi_transport_spi [last unloaded: raid456]
[ 1269.753039] 
[ 1269.753039] Pid: 4227, comm: dmsetup Not tainted 2.6.38-0.rc4.git7.1.fc15.i686.PAE #1 440BX Desktop Reference Platform/VMware Virtual Platform
[ 1269.753039] EIP: 0060:[<c070c3d0>] EFLAGS: 00010246 CPU: 0
[ 1269.753039] EIP is at md_integrity_register+0x29/0xdc
[ 1269.753039] EAX: 00000000 EBX: cf8dd398 ECX: 00000000 EDX: 00000000
[ 1269.753039] ESI: cf8dd00c EDI: 00000000 EBP: cc79dd4c ESP: cc79dd34
[ 1269.753039]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[ 1269.753039] Process dmsetup (pid: 4227, ti=cc79c000 task=cf8e25b0 task.ti=cc79c000)
[ 1269.753039] Stack:
[ 1269.753039]  cc780f68 cc780f68 cf8dd01c cf8dd00c cc780f68 0000000c cc79dd74 d10b7d46
[ 1269.753039]  d10ba952 d10ba775 00000002 00000002 000000e0 cf8dd01c cf8dd00c cf8dd01c
[ 1269.753039]  cc79ddfc c070f8cf 00000001 c098c29a 00000001 00000021 c040b453 cf8dd394
[ 1269.753039] Call Trace:
[ 1269.753039]  [<d10b7d46>] run+0x24c/0x256 [raid1]
[ 1269.753039]  [<c070f8cf>] md_run+0x4f0/0x7ab
[ 1269.753039]  [<c040b453>] ? do_softirq+0x8c/0x92
[ 1269.753039]  [<c0420d64>] ? smp_apic_timer_interrupt+0x6b/0x78
[ 1269.753039]  [<c0435ad9>] ? __might_sleep+0x29/0xe4
[ 1269.753039]  [<c042f5a4>] ? should_resched+0xd/0x27
[ 1269.753039]  [<c07e603e>] ? _cond_resched+0xd/0x21
[ 1269.753039]  [<d10f5ff7>] raid_ctr+0x8b4/0x8d4 [dm_raid]
[ 1269.753039]  [<c07177f5>] dm_table_add_target+0x165/0x202
[ 1269.753039]  [<c04d4f7f>] ? vfree+0x25/0x27
[ 1269.753039]  [<c071715b>] ? alloc_targets+0x8c/0xb1
[ 1269.753039]  [<c071a05c>] table_load+0x233/0x242
[ 1269.753039]  [<c0719aa8>] dm_ctl_ioctl+0x1af/0x1ed
[ 1269.753039]  [<c043474f>] ? pick_next_task_fair+0x85/0x8d
[ 1269.753039]  [<c0719e29>] ? table_load+0x0/0x242
[ 1269.753039]  [<c07198f9>] ? dm_ctl_ioctl+0x0/0x1ed
[ 1269.753039]  [<c04f9c4f>] do_vfs_ioctl+0x451/0x482
[ 1269.753039]  [<c0491ca0>] ? __call_rcu+0xdb/0xe1
[ 1269.753039]  [<c0501713>] ? mntput_no_expire+0x28/0xbd
[ 1269.753039]  [<c05017c6>] ? mntput+0x1e/0x20
[ 1269.753039]  [<c04f4f4f>] ? path_put+0x1a/0x1d
[ 1269.753039]  [<c04f9cc7>] sys_ioctl+0x47/0x60
[ 1269.753039]  [<c040969f>] sysenter_do_call+0x12/0x28
[ 1269.753039] Code: 5d c3 55 89 e5 57 56 53 83 ec 0c 3e 8d 74 26 00 89 c6 8b 5e 10 8d 40 10 89 45 f0 31 c0 3b 5d f0 0f 84 b0 00 00 00 8b 56 30 31 ff <83> ba f4 01 00 00 00 0f 85 9e 00 00 00 eb 38 8b 43 6c a8 02 75 
[ 1269.753039] EIP: [<c070c3d0>] md_integrity_register+0x29/0xdc SS:ESP 0068:cc79dd34
[ 1269.753039] CR2: 00000000000001f4
[ 1269.898378] ---[ end trace 49f34abab1d4a1b8 ]---
----

Creating and then deleting a raid4:

----
root@diploma1-f:~# ~vmiklos/dm-raid-patches/gime_raid.pl raid4 /dev/sd[bcdef]1
RAID type    : raid4
Block devices: /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1
Device size  : 770868BB

root@diploma1-f:~# dmsetup remove raid
Killed
----

dmesg:

----
[ 1289.594670] bio: create slab <bio-1> at 1
[ 1289.617603] md/raid:mdX: not clean -- starting background reconstruction
[ 1289.623386] md/raid:mdX: device sdf1 operational as raid disk 4
[ 1289.634279] md/raid:mdX: device sde1 operational as raid disk 3
[ 1289.634889] md/raid:mdX: device sdd1 operational as raid disk 2
[ 1289.635479] md/raid:mdX: device sdc1 operational as raid disk 1
[ 1289.636071] md/raid:mdX: device sdb1 operational as raid disk 0
[ 1289.661945] md/raid:mdX: allocated 5265kB
[ 1289.663995] md/raid:mdX: raid level 5 active with 5 out of 5 devices, algorithm 4
[ 1289.665539] RAID conf printout:
[ 1289.665630]  --- level:5 rd:5 wd:5
[ 1289.665742]  disk 0, o:1, dev:sdb1
[ 1289.665759]  disk 1, o:1, dev:sdc1
[ 1289.665765]  disk 2, o:1, dev:sdd1
[ 1289.665770]  disk 3, o:1, dev:sde1
[ 1289.665776]  disk 4, o:1, dev:sdf1
[ 1289.685823] md: resync of RAID array mdX
[ 1289.686486] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[ 1289.687117] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for resync.
[ 1289.701594] md: using 128k window, over a total of 96256 blocks.
[ 1289.892256] attempt to access beyond end of device
[ 1289.893552] sdc1: rw=0, want=193160, limit=192717
[ 1289.895278] attempt to access beyond end of device
[ 1289.895768] sdf1: rw=0, want=193160, limit=192717
[ 1289.896795] attempt to access beyond end of device
[ 1289.897354] sde1: rw=0, want=193160, limit=192717
[ 1289.897991] attempt to access beyond end of device
[ 1289.898565] sdd1: rw=0, want=193160, limit=192717
[ 1289.899524] attempt to access beyond end of device
[ 1289.900051] sdb1: rw=0, want=193160, limit=192717
[ 1289.901523] md/raid:mdX: Disk failure on sdf1, disabling device.
[ 1289.901528] md/raid:mdX: Operation continuing on 4 devices.
[ 1289.909029] md/raid:mdX: Disk failure on sde1, disabling device.
[ 1289.909033] md/raid:mdX: Operation continuing on 3 devices.
[ 1289.917651] md/raid:mdX: Disk failure on sdd1, disabling device.
[ 1289.917656] md/raid:mdX: Operation continuing on 2 devices.
[ 1289.918748] md/raid:mdX: Disk failure on sdc1, disabling device.
[ 1289.918752] md/raid:mdX: Operation continuing on 1 devices.
[ 1289.933445] md/raid:mdX: Disk failure on sdb1, disabling device.
[ 1289.933449] md/raid:mdX: Operation continuing on 0 devices.
[ 1289.936280] Buffer I/O error on device dm-2, logical block 192673
[ 1289.937184] Buffer I/O error on device dm-2, logical block 192672
[ 1289.954961] Buffer I/O error on device dm-2, logical block 192673
[ 1289.955619] Buffer I/O error on device dm-2, logical block 192672
[ 1289.956650] Buffer I/O error on device dm-2, logical block 192712
[ 1289.960572] Buffer I/O error on device dm-2, logical block 192713
[ 1289.961338] Buffer I/O error on device dm-2, logical block 192712
[ 1289.962174] Buffer I/O error on device dm-2, logical block 192713
[ 1289.963016] Buffer I/O error on device dm-2, logical block 0
[ 1289.977325] Buffer I/O error on device dm-2, logical block 1
[ 1290.812860] md: mdX: resync done.
[ 1290.813983] md: checkpointing resync of mdX.
[ 1290.822997] md: resync of RAID array mdX
[ 1290.831231] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[ 1290.832595] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for resync.
[ 1290.833594] md: using 128k window, over a total of 96256 blocks.
[ 1290.834244] md: resuming resync of mdX from checkpoint.
[ 1290.834877] md: mdX: resync done.
[ 1290.850001] RAID conf printout:
[ 1290.850062]  --- level:5 rd:5 wd:0
[ 1290.850070]  disk 0, o:0, dev:sdb1
[ 1290.850076]  disk 1, o:0, dev:sdc1
[ 1290.850081]  disk 2, o:0, dev:sdd1
[ 1290.850086]  disk 3, o:0, dev:sde1
[ 1290.850091]  disk 4, o:0, dev:sdf1
[ 1293.215726] BUG: sleeping function called from invalid context at arch/x86/mm/fault.c:1081
[ 1293.216017] in_atomic(): 0, irqs_disabled(): 1, pid: 4250, name: dmsetup
[ 1293.216017] Pid: 4250, comm: dmsetup Not tainted 2.6.38-0.rc4.git7.1.fc15.i686.PAE #1
[ 1293.216017] Call Trace:
[ 1293.216017]  [<c0435b8d>] ? __might_sleep+0xdd/0xe4
[ 1293.216017]  [<c07ea191>] ? do_page_fault+0x179/0x30c
[ 1293.216017]  [<c0465112>] ? tick_dev_program_event+0x2f/0x137
[ 1293.216017]  [<c04e0b72>] ? kmem_cache_free+0x67/0x94
[ 1293.216017]  [<c0535dfd>] ? release_sysfs_dirent+0x79/0x8c
[ 1293.216017]  [<c07ea018>] ? do_page_fault+0x0/0x30c
[ 1293.216017]  [<c07e7d97>] ? error_code+0x67/0x6c
[ 1293.216017]  [<c07e726b>] ? _raw_spin_lock_irqsave+0x15/0x27
[ 1293.216017]  [<c05c7d92>] ? __disk_unblock_events+0x23/0x9e
[ 1293.216017]  [<c042f5a4>] ? should_resched+0xd/0x27
[ 1293.216017]  [<c05c91bc>] ? disk_unblock_events+0x1b/0x1d
[ 1293.216017]  [<c0511052>] ? blkdev_put+0xbb/0xe7
[ 1293.216017]  [<c0716f35>] ? close_dev+0x30/0x3a
[ 1293.216017]  [<c0716f61>] ? dm_put_device+0x22/0x33
[ 1293.216017]  [<d10f56e7>] ? context_free+0x58/0x73 [dm_raid]
[ 1293.216017]  [<d10f5740>] ? raid_dtr+0x3e/0x41 [dm_raid]
[ 1293.216017]  [<c0717550>] ? dm_table_destroy+0x5b/0xcf
[ 1293.216017]  [<c0469a28>] ? arch_local_irq_save+0x12/0x17
[ 1293.216017]  [<c0715ae3>] ? __dm_destroy+0xfa/0x1c2
[ 1293.216017]  [<c0716412>] ? dm_destroy+0x12/0x14
[ 1293.216017]  [<c071a146>] ? dev_remove+0xdb/0xe5
[ 1293.216017]  [<c05dae00>] ? copy_to_user+0x12/0x4b
[ 1293.216017]  [<c0719aa8>] ? dm_ctl_ioctl+0x1af/0x1ed
[ 1293.216017]  [<c058fae7>] ? newary+0x10a/0x11c
[ 1293.216017]  [<c071a06b>] ? dev_remove+0x0/0xe5
[ 1293.216017]  [<c07198f9>] ? dm_ctl_ioctl+0x0/0x1ed
[ 1293.216017]  [<c04f9c4f>] ? do_vfs_ioctl+0x451/0x482
[ 1293.216017]  [<c0491ca0>] ? __call_rcu+0xdb/0xe1
[ 1293.216017]  [<c0501713>] ? mntput_no_expire+0x28/0xbd
[ 1293.216017]  [<c05017c6>] ? mntput+0x1e/0x20
[ 1293.216017]  [<c04f4f4f>] ? path_put+0x1a/0x1d
[ 1293.216017]  [<c04f9cc7>] ? sys_ioctl+0x47/0x60
[ 1293.216017]  [<c040969f>] ? sysenter_do_call+0x12/0x28
[ 1293.216017] BUG: unable to handle kernel NULL pointer dereference at 0000080c
[ 1293.216017] IP: [<c07e726b>] _raw_spin_lock_irqsave+0x15/0x27
[ 1293.216017] *pdpt = 000000000f7cb001 *pde = 000000000c661067 *pte = 0000000000000000 
[ 1293.216017] Oops: 0002 [#1] SMP 
[ 1293.216017] last sysfs file: /sys/devices/virtual/block/dm-2/range
[ 1293.216017] Modules linked in: dm_raid raid1 raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx sunrpc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables snd_ens1371 gameport snd_rawmidi snd_ac97_codec ac97_bus snd_seq snd_seq_device snd_pcm snd_timer ppdev vmw_balloon snd pcnet32 mii soundcore snd_page_alloc i2c_piix4 parport_pc i2c_core parport ipv6 mptspi mptscsih mptbase scsi_transport_spi [last unloaded: raid456]
[ 1293.216017] 
[ 1293.216017] Pid: 4250, comm: dmsetup Not tainted 2.6.38-0.rc4.git7.1.fc15.i686.PAE #1 440BX Desktop Reference Platform/VMware Virtual Platform
[ 1293.216017] EIP: 0060:[<c07e726b>] EFLAGS: 00010082 CPU: 0
[ 1293.216017] EIP is at _raw_spin_lock_irqsave+0x15/0x27
[ 1293.216017] EAX: 00000282 EBX: 0000080c ECX: 00000083 EDX: 00000100
[ 1293.216017] ESI: cf98be00 EDI: 00000001 EBP: cf5b9dfc ESP: cf5b9df8
[ 1293.216017]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[ 1293.216017] Process dmsetup (pid: 4250, ti=cf5b8000 task=cfa77110 task.ti=cf5b8000)
[ 1293.216017] Stack:
[ 1293.216017]  00000800 cf5b9e18 c05c7d92 c042f5a4 0000080c cbc31400 cbc31410 00000083
[ 1293.216017]  cf5b9e20 c05c91bc cf5b9e34 c0511052 cda488c0 cccc5000 00000000 cf5b9e40
[ 1293.216017]  c0716f35 cda488c0 cf5b9e4c c0716f61 cccc5000 cf5b9e60 d10f56e7 cccc5000
[ 1293.216017] Call Trace:
[ 1293.216017]  [<c05c7d92>] __disk_unblock_events+0x23/0x9e
[ 1293.216017]  [<c042f5a4>] ? should_resched+0xd/0x27
[ 1293.216017]  [<c05c91bc>] disk_unblock_events+0x1b/0x1d
[ 1293.216017]  [<c0511052>] blkdev_put+0xbb/0xe7
[ 1293.216017]  [<c0716f35>] close_dev+0x30/0x3a
[ 1293.216017]  [<c0716f61>] dm_put_device+0x22/0x33
[ 1293.216017]  [<d10f56e7>] context_free+0x58/0x73 [dm_raid]
[ 1293.216017]  [<d10f5740>] raid_dtr+0x3e/0x41 [dm_raid]
[ 1293.216017]  [<c0717550>] dm_table_destroy+0x5b/0xcf
[ 1293.216017]  [<c0469a28>] ? arch_local_irq_save+0x12/0x17
[ 1293.216017]  [<c0715ae3>] __dm_destroy+0xfa/0x1c2
[ 1293.216017]  [<c0716412>] dm_destroy+0x12/0x14
[ 1293.216017]  [<c071a146>] dev_remove+0xdb/0xe5
[ 1293.216017]  [<c05dae00>] ? copy_to_user+0x12/0x4b
[ 1293.216017]  [<c0719aa8>] dm_ctl_ioctl+0x1af/0x1ed
[ 1293.216017]  [<c058fae7>] ? newary+0x10a/0x11c
[ 1293.216017]  [<c071a06b>] ? dev_remove+0x0/0xe5
[ 1293.216017]  [<c07198f9>] ? dm_ctl_ioctl+0x0/0x1ed
[ 1293.216017]  [<c04f9c4f>] do_vfs_ioctl+0x451/0x482
[ 1293.216017]  [<c0491ca0>] ? __call_rcu+0xdb/0xe1
[ 1293.216017]  [<c0501713>] ? mntput_no_expire+0x28/0xbd
[ 1293.216017]  [<c05017c6>] ? mntput+0x1e/0x20
[ 1293.216017]  [<c04f4f4f>] ? path_put+0x1a/0x1d
[ 1293.216017]  [<c04f9cc7>] sys_ioctl+0x47/0x60
[ 1293.216017]  [<c040969f>] sysenter_do_call+0x12/0x28
[ 1293.216017] Code: 0f 95 c0 0f b6 c0 c3 55 89 e5 3e 8d 74 26 00 e8 06 28 c8 ff 5d c3 55 89 e5 53 3e 8d 74 26 00 89 c3 e8 b0 27 c8 ff ba 00 01 00 00 <3e> 66 0f c1 13 38 f2 74 06 f3 90 8a 13 eb f6 5b 5d c3 55 89 e5 
[ 1293.216017] EIP: [<c07e726b>] _raw_spin_lock_irqsave+0x15/0x27 SS:ESP 0068:cf5b9df8
[ 1293.216017] CR2: 000000000000080c
[ 1293.216017] ---[ end trace 49f34abab1d4a1b8 ]---
----

Creating a raid4 and creating a filesystem on it:

----
root@diploma1-f:~# ~vmiklos/dm-raid-patches/gime_raid.pl raid4 /dev/sd[bcdef]1
RAID type    : raid4
Block devices: /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1
Device size  : 770868BB

root@diploma1-f:~# mkfs.ext4 /dev/mapper/raid 
mke2fs 1.41.14 (22-Dec-2010)
----

and it hangs here.

dmesg:

----
[ 1270.954599] bio: create slab <bio-1> at 1
[ 1270.963550] md/raid:mdX: not clean -- starting background reconstruction
[ 1270.987463] md/raid:mdX: device sdf1 operational as raid disk 4
[ 1270.989895] md/raid:mdX: device sde1 operational as raid disk 3
[ 1270.990513] md/raid:mdX: device sdd1 operational as raid disk 2
[ 1270.991101] md/raid:mdX: device sdc1 operational as raid disk 1
[ 1270.992075] md/raid:mdX: device sdb1 operational as raid disk 0
[ 1271.028680] md/raid:mdX: allocated 5265kB
[ 1271.047135] md/raid:mdX: raid level 5 active with 5 out of 5 devices, algorithm 4
[ 1271.048513] RAID conf printout:
[ 1271.048604]  --- level:5 rd:5 wd:5
[ 1271.048749]  disk 0, o:1, dev:sdb1
[ 1271.048768]  disk 1, o:1, dev:sdc1
[ 1271.048773]  disk 2, o:1, dev:sdd1
[ 1271.048779]  disk 3, o:1, dev:sde1
[ 1271.048784]  disk 4, o:1, dev:sdf1
[ 1271.079401] md: resync of RAID array mdX
[ 1271.080482] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[ 1271.081111] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for resync.
[ 1271.082639] md: using 128k window, over a total of 96256 blocks.
[ 1271.261775] attempt to access beyond end of device
[ 1271.262542] sdc1: rw=0, want=193160, limit=192717
[ 1271.263924] attempt to access beyond end of device
[ 1271.264467] sdf1: rw=0, want=193160, limit=192717
[ 1271.265172] attempt to access beyond end of device
[ 1271.265747] sde1: rw=0, want=193160, limit=192717
[ 1271.266436] attempt to access beyond end of device
[ 1271.266946] sdd1: rw=0, want=193160, limit=192717
[ 1271.267872] attempt to access beyond end of device
[ 1271.281551] sdb1: rw=0, want=193160, limit=192717
[ 1271.282992] md/raid:mdX: Disk failure on sdf1, disabling device.
[ 1271.282996] md/raid:mdX: Operation continuing on 4 devices.
[ 1271.284807] md/raid:mdX: Disk failure on sde1, disabling device.
[ 1271.284811] md/raid:mdX: Operation continuing on 3 devices.
[ 1271.285962] md/raid:mdX: Disk failure on sdd1, disabling device.
[ 1271.285966] md/raid:mdX: Operation continuing on 2 devices.
[ 1271.300746] md/raid:mdX: Disk failure on sdc1, disabling device.
[ 1271.300750] md/raid:mdX: Operation continuing on 1 devices.
[ 1271.302077] md/raid:mdX: Disk failure on sdb1, disabling device.
[ 1271.302081] md/raid:mdX: Operation continuing on 0 devices.
[ 1271.322206] Buffer I/O error on device dm-2, logical block 192673
[ 1271.323102] Buffer I/O error on device dm-2, logical block 192672
[ 1271.326465] Buffer I/O error on device dm-2, logical block 192672
[ 1271.338964] Buffer I/O error on device dm-2, logical block 192673
[ 1271.339820] Buffer I/O error on device dm-2, logical block 192712
[ 1271.340476] Buffer I/O error on device dm-2, logical block 192713
[ 1271.341118] Buffer I/O error on device dm-2, logical block 192712
[ 1271.341700] Buffer I/O error on device dm-2, logical block 192713
[ 1271.356296] Buffer I/O error on device dm-2, logical block 0
[ 1271.356867] Buffer I/O error on device dm-2, logical block 1
[ 1272.188477] md: mdX: resync done.
[ 1272.189634] md: checkpointing resync of mdX.
[ 1272.192828] md: resync of RAID array mdX
[ 1272.193363] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[ 1272.194090] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for resync.
[ 1272.195083] md: using 128k window, over a total of 96256 blocks.
[ 1272.195723] md: resuming resync of mdX from checkpoint.
[ 1272.209475] md: mdX: resync done.
[ 1272.210697] RAID conf printout:
[ 1272.210709]  --- level:5 rd:5 wd:0
[ 1272.210716]  disk 0, o:0, dev:sdb1
[ 1272.210721]  disk 1, o:0, dev:sdc1
[ 1272.210726]  disk 2, o:0, dev:sdd1
[ 1272.210731]  disk 3, o:0, dev:sde1
[ 1272.210736]  disk 4, o:0, dev:sdf1
----

Are these expected or the above works for you?

Thanks.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Improving dm-mirror as a final year project
  2011-02-17 10:49             ` Miklos Vajna
@ 2011-02-17 21:02               ` Jonathan Brassow
  0 siblings, 0 replies; 14+ messages in thread
From: Jonathan Brassow @ 2011-02-17 21:02 UTC (permalink / raw)
  To: device-mapper development


On Feb 17, 2011, at 4:49 AM, Miklos Vajna wrote:

> On Wed, Feb 16, 2011 at 06:12:19PM +0100, Miklos Vajna  
> <vmiklos@ulx.hu> wrote:
>>> The basic component that covers RAID456 is available upstream, as  
>>> you
>>> saw.  I have an additional set of ~12 (reasonably small) patches  
>>> that
>>> add RAID1 and superblock/bitmap support.  These patches are not yet
>>> upstream nor are they in any RHEL product.
>>
>> Then what is the recommended platform to hack dm-raid? I have RHEL6  
>> at
>> the moment. Is it OK to try to cherry-pick the single commit from
>> upstream + apply your patches or is it better to install rawhide  
>> where
>> the kernel is already 2.6.38rc5 (as far as I see) and only apply your
>> patches there?

I tend to use RHEL6 + the latest upstream kernel.  I would /not/  
cherry-pick the upstream dm-raid patch and then add my next set - you  
might miss some other valuable MD patches (there have been a couple).

>>
>>> For convenience, I've attached the patches I'm working on (quilt
>>> directory) and the latest gime_raid.pl script.
>
> I tried these on Fedora 15 with mixed results.
>
> I prepared a kernel source tree:
>
> ----
> $ wget http://download.fedora.redhat.com/pub/fedora/linux/development/15/source/SRPMS/kernel-2.6.38-0.rc4.git0.2.fc15.src.rpm

Please use upstream (kernel.org) kernel.

Contact me with any trouble.

  brassow

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Improving dm-mirror as a final year project
@ 2011-01-26  1:58 Miklos Vajna
  0 siblings, 0 replies; 14+ messages in thread
From: Miklos Vajna @ 2011-01-26  1:58 UTC (permalink / raw)
  To: dm-devel

Hi,

I got the possibility to work on dm-mirror as a final year project at
ULX, the Hungarian distributor of Red Hat.

Get my feet wet, I created two small patches:

1) dm-mirror: allow setting ios count to 0

Always read from the default_mirror in that case.

2) dm-mirror: allow setting the default mirror

These can help in case one data leg of a mirror is a remote (iSCSI) one,
so the default RR aproach is not optimal for reading. (One may set the
ios count to 0, set the default mirror to the local one, and that will
cause a read speedup.)

I do not yet have the right to send those patches (I do this in
university time, so the copyright is not mine), but I hope to be able to
do so - to get them reviewed.

So the final year project's aim is to improve "the fault tolerance and
performance of dm-mirror". We (I and my mentors) have two ideas in that
area, not counting the above patches:

1) Make the currently hardwired RR read approach modular, that would
allow implementing for example a weighted RR algorithm. (In case one
disk is two times faster than the other one, etc.)

2) From our experiments, it seems that in case the dm-mirror looses one
of its legs and there is a write to the mirror, it gets converted to a
linear volume. It would be nice (not sure how easy) to use the mirror
log to mark the dirty blocks, so that the volume would not be converted
to a linear one: once the other leg is back, it could be updated based
on the mirror log.

The question: what do you (who have much more experience with dm-mirror
than me) think - are these reasonable goals? If not, what would you
improve/change/add/remove to the above list?

Thanks,

Miklos

PS: Sorry if this is a duplicate, first I did not subscribe to the list.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2011-02-17 21:02 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-01-25 15:53 Improving dm-mirror as a final year project Miklos Vajna
2011-01-26 17:24 ` nishant mungse
2011-01-26 20:04   ` Malahal Naineni
2011-01-27  8:28     ` nishant mungse
2011-02-01 16:12 ` Jonathan Brassow
2011-02-09 17:13   ` Miklos Vajna
2011-02-14 17:33   ` Miklos Vajna
2011-02-14 21:31     ` Jonathan Brassow
2011-02-15 12:52       ` Miklos Vajna
2011-02-15 22:13         ` Jonathan Brassow
2011-02-16 17:12           ` Miklos Vajna
2011-02-17 10:49             ` Miklos Vajna
2011-02-17 21:02               ` Jonathan Brassow
2011-01-26  1:58 Miklos Vajna

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.