From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S964978AbWAWXC0 (ORCPT ); Mon, 23 Jan 2006 18:02:26 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S964979AbWAWXC0 (ORCPT ); Mon, 23 Jan 2006 18:02:26 -0500 Received: from ms-smtp-03.nyroc.rr.com ([24.24.2.57]:59814 "EHLO ms-smtp-03.nyroc.rr.com") by vger.kernel.org with ESMTP id S964978AbWAWXCZ (ORCPT ); Mon, 23 Jan 2006 18:02:25 -0500 Message-ID: <0ad101c62071$261ebde0$03c8a8c0@kroptech.com> From: "Adam Kropelin" To: "Neil Brown" Cc: , , "Steinar H. Gunderson" References: <20060117174531.27739.patches@notabene> <20060121234234.A32306@mail.kroptech.com> <17364.3266.29943.914861@cse.unsw.edu.au> Subject: Re: [PATCH 000 of 5] md: Introduction Date: Mon, 23 Jan 2006 18:02:56 -0500 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.2527 X-MIMEOLE: Produced By Microsoft MimeOLE V6.00.2900.2527 Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Neil Brown wrote: > On Saturday January 21, akropel1@rochester.rr.com wrote: >> On the first try I neglected to read the directions and increased the >> number of devices first (which worked) and then attempted to add the >> physical device (which didn't work; at least not the way I intended). > > Thanks, this is exactly the sort of feedback I was hoping for - people > testing thing that I didn't think to... > >> mdadm --create -l5 -n3 /dev/md0 /dev/sda /dev/sdb /dev/sdc >> >> md0 : active raid5 sda[0] sdc[2] sdb[1] >> 2097024 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU] >> >> mdadm --grow -n4 /dev/md0 >> >> md0 : active raid5 sda[0] sdc[2] sdb[1] >> 3145536 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_] > > I assume that no "resync" started at this point? It should have done. Actually, it did start a resync. Sorry, I should have mentioned that. I waited until the resync completed before I issued the 'mdadm --add' command. >> md0 : active raid5 sdd[3] sdc[2] sdb[1] sda[0] >> 2097024 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU] >> ...should this be... --> [4/3] >> [UUU_] perhaps? > > Well, part of the array is "4/4 UUUU" and part is "3/3 UUU". How do > you represent that? I think "4/4 UUUU" is best. I see your point. I was expecting some indication that that my array was vulnerable and that the new disk was not fully utilized yet. I guess the resync in progress indicator is sufficient. >> My final test was a repeat of #2, but with data actively being >> written >> to the array during the reshape (the previous tests were on an idle, >> unmounted array). This one failed pretty hard, with several processes >> ending up in the D state. > > Hmmm... I tried similar things but didn't get this deadlock. Somehow > the fact that mdadm is holding the reconfig_sem semaphore means that > some IO cannot proceed and so mdadm cannot grab and resize all the > stripe heads... I'll have to look more deeply into this. For what it's worth, I'm using the Buslogic SCSI driver for the disks in the array. >> I'm happy to do more tests. It's easy to conjur up virtual disks and >> load them with irrelevant data (like kernel trees ;) > > Great. I'll probably be putting out a new patch set late this week > or early next. Hopefully it will fix the issues you can found and you > can try it again.. Looking forward to it... --Adam