linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Neil Brown <neilb@suse.de>
To: Adam Kropelin <akropel1@rochester.rr.com>
Cc: NeilBrown <neilb@suse.de>,
	linux-raid@vger.kernel.org, linux-kernel@vger.kernel.org,
	"Steinar H. Gunderson" <sgunderson@bigfoot.com>
Subject: Re: [PATCH 000 of 5] md: Introduction
Date: Mon, 23 Jan 2006 09:52:50 +1100	[thread overview]
Message-ID: <17364.3266.29943.914861@cse.unsw.edu.au> (raw)
In-Reply-To: message from Adam Kropelin on Saturday January 21

On Saturday January 21, akropel1@rochester.rr.com wrote:
> NeilBrown <neilb@suse.de> wrote:
> > In line with the principle of "release early", following are 5 patches
> > against md in 2.6.latest which implement reshaping of a raid5 array.
> > By this I mean adding 1 or more drives to the array and then re-laying
> > out all of the data.
> 
> I've been looking forward to a feature like this, so I took the
> opportunity to set up a vmware session and give the patches a try. I
> encountered both success and failure, and here are the details of both.
> 
> On the first try I neglected to read the directions and increased the
> number of devices first (which worked) and then attempted to add the
> physical device (which didn't work; at least not the way I intended).
> The result was an array of size 4, operating in degraded mode, with 
> three active drives and one spare. I was unable to find a way to coax
> mdadm into adding the 4th drive as an active device instead of a 
> spare. I'm not an mdadm guru, so there may be a method I overlooked.
> Here's what I did, interspersed with trimmed /proc/mdstat output:

Thanks, this is exactly the sort of feedback I was hoping for - people
testing thing that I didn't think to...

> 
>   mdadm --create -l5 -n3 /dev/md0 /dev/sda /dev/sdb /dev/sdc
> 
>     md0 : active raid5 sda[0] sdc[2] sdb[1]
>           2097024 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]
> 
>   mdadm --grow -n4 /dev/md0
> 
>     md0 : active raid5 sda[0] sdc[2] sdb[1]
>           3145536 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_]

I assume that no "resync" started at this point?  It should have done.

> 
>   mdadm --manage --add /dev/md0 /dev/sdd
> 
>     md0 : active raid5 sdd[3](S) sda[0] sdc[2] sdb[1]
>           3145536 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_]
> 
>   mdadm --misc --stop /dev/md0
>   mdadm --assemble /dev/md0 /dev/sda /dev/sdb /dev/sdc /dev/sdd
> 
>     md0 : active raid5 sdd[3](S) sda[0] sdc[2] sdb[1]
>           3145536 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_]

This really should have started a recovery.... I'll look into that
too.


> 
> For my second try I actually read the directions and things went much
> better, aside from a possible /proc/mdstat glitch shown below.
> 
>   mdadm --create -l5 -n3 /dev/md0 /dev/sda /dev/sdb /dev/sdc
> 
>     md0 : active raid5 sda[0] sdc[2] sdb[1]
>           2097024 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]
> 
>   mdadm --manage --add /dev/md0 /dev/sdd
> 
>     md0 : active raid5 sdd[3](S) sdc[2] sdb[1] sda[0]
>           2097024 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]
> 
>   mdadm --grow -n4 /dev/md0
> 
>     md0 : active raid5 sdd[3] sdc[2] sdb[1] sda[0]
>           2097024 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU]
>                                 ...should this be... --> [4/3] [UUU_] perhaps?

Well, part of the array is "4/4 UUUU" and part is "3/3 UUU".  How do
you represent that?  I think "4/4 UUUU" is best.


>           [>....................]  recovery =  0.4% (5636/1048512) finish=9.1min speed=1878K/sec
> 
>     [...time passes...]
> 
>     md0 : active raid5 sdd[3] sdc[2] sdb[1] sda[0]
>           3145536 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU]
> 
> My final test was a repeat of #2, but with data actively being written
> to the array during the reshape (the previous tests were on an idle,
> unmounted array). This one failed pretty hard, with several processes
> ending up in the D state. I repeated it twice and sysrq-t dumps can be
> found at <http://www.kroptech.com/~adk0212/md-raid5-reshape-wedge.txt>.
> The writeout load was a kernel tree untar started shortly before the
> 'mdadm --grow' command was given. mdadm hung, as did tar. Any process
> which subsequently attmpted to access the array hung as well. A second
> attempt at the same thing hung similarly, although only pdflush shows up
> hung in that trace. mdadm and tar are missing for some reason.

Hmmm... I tried similar things but didn't get this deadlock.  Somehow
the fact that mdadm is holding the reconfig_sem semaphore means that
some IO cannot proceed and so mdadm cannot grab and resize all the
stripe heads... I'll have to look more deeply into this.

> 
> I'm happy to do more tests. It's easy to conjur up virtual disks and
> load them with irrelevant data (like kernel trees ;)

Great.  I'll probably be putting out a new patch set  late this week
or early next.  Hopefully it will fix the issues you can found and you
can try it again..


Thanks again,
NeilBrown

  reply	other threads:[~2006-01-22 22:53 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-01-17  6:56 [PATCH 000 of 5] md: Introduction NeilBrown
2006-01-17  6:56 ` [PATCH 001 of 5] md: Split disks array out of raid5 conf structure so it is easier to grow NeilBrown
2006-01-17 14:37   ` John Stoffel
2006-01-19  0:26     ` Neil Brown
2006-01-21  3:37       ` John Stoffel
2006-01-22 22:57         ` Neil Brown
2006-01-17  6:56 ` [PATCH 002 of 5] md: Allow stripes to be expanded in preparation for expanding an array NeilBrown
2006-01-17  6:56 ` [PATCH 003 of 5] md: Infrastructure to allow normal IO to continue while array is expanding NeilBrown
2006-01-17  6:56 ` [PATCH 004 of 5] md: Core of raid5 resize process NeilBrown
2006-01-17  6:56 ` [PATCH 005 of 5] md: Final stages of raid5 expand code NeilBrown
2006-01-17  9:55   ` Sander
2006-01-19  0:32     ` Neil Brown
2006-01-17  8:17 ` [PATCH 000 of 5] md: Introduction Michael Tokarev
2006-01-17  9:50   ` Sander
2006-01-17 11:26     ` Michael Tokarev
2006-01-17 14:03       ` Kyle Moffett
2006-01-19  0:28         ` Neil Brown
2006-01-17 16:08       ` Ross Vandegrift
2006-01-17 18:12         ` Michael Tokarev
2006-01-18  8:14           ` Sander
2006-01-18  9:03             ` Alan Cox
2006-01-19  0:22           ` Neil Brown
2006-01-19  9:01             ` Jakob Oestergaard
2006-01-17 22:38       ` Phillip Susi
2006-01-17 22:57         ` Neil Brown
2006-01-17 14:10   ` Steinar H. Gunderson
2006-01-22  4:42 ` Adam Kropelin
2006-01-22 22:52   ` Neil Brown [this message]
2006-01-23 23:02     ` Adam Kropelin
2006-01-23  1:08 ` John Hendrikx
2006-01-23  1:25   ` Neil Brown
2006-01-23  1:54     ` Kyle Moffett
2006-01-17 21:38 Lincoln Dale (ltd)
2006-01-18 13:27 ` Jan Engelhardt
2006-01-18 23:19   ` Neil Brown
2006-01-19 15:33     ` Mark Hahn
2006-01-19 20:12     ` Jan Engelhardt
2006-01-19 21:22       ` Lars Marowsky-Bree
2006-01-19 22:17     ` Phillip Susi
2006-01-19 22:32       ` Neil Brown
2006-01-19 23:26         ` Phillip Susi
2006-01-19 23:43           ` Neil Brown
2006-01-20  2:17             ` Phillip Susi
2006-01-20 10:53               ` Lars Marowsky-Bree
2006-01-20 12:06                 ` Jens Axboe
2006-01-20 18:38                 ` Heinz Mauelshagen
2006-01-20 22:09                   ` Lars Marowsky-Bree
2006-01-21  0:06                     ` Heinz Mauelshagen
2006-01-20 18:41               ` Heinz Mauelshagen
2006-01-20 17:29             ` Ross Vandegrift
2006-01-20 18:36             ` Heinz Mauelshagen
2006-01-20 22:57               ` Lars Marowsky-Bree
2006-01-21  0:01                 ` Heinz Mauelshagen
2006-01-21  0:03                   ` Lars Marowsky-Bree
2006-01-21  0:08                     ` Heinz Mauelshagen
2006-01-21  0:13                       ` Lars Marowsky-Bree
2006-01-23  9:44                         ` Heinz Mauelshagen
2006-01-23 10:26                           ` Lars Marowsky-Bree
2006-01-23 10:38                             ` Heinz Mauelshagen
2006-01-23 10:45                               ` Lars Marowsky-Bree
2006-01-23 11:00                                 ` Heinz Mauelshagen
2006-01-23 12:54                           ` Ville Herva
2006-01-23 13:00                             ` Steinar H. Gunderson
2006-01-23 13:54                             ` Heinz Mauelshagen
2006-01-23 17:33                               ` Ville Herva
2006-01-24  2:02                             ` Phillip Susi
2006-01-20 16:48 Hubert Tonneau
2006-01-20 17:01 Hubert Tonneau
2006-01-20 16:15 ` Christoph Hellwig
2006-01-22  6:45   ` Herbert Poetzl
2006-01-20 18:05 Hubert Tonneau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=17364.3266.29943.914861@cse.unsw.edu.au \
    --to=neilb@suse.de \
    --cc=akropel1@rochester.rr.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=sgunderson@bigfoot.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).