From: Bill Davidsen <davidsen@tmr.com>
To: Bill Davidsen <davidsen@tmr.com>
Cc: Neil Brown <neilb@suse.de>,
david@lang.hm, linux-kernel@vger.kernel.org,
linux-raid@vger.kernel.org
Subject: Re: limits on raid
Date: Thu, 21 Jun 2007 19:03:29 -0400 [thread overview]
Message-ID: <467B03C1.50809@tmr.com> (raw)
In-Reply-To: <46756BE2.7010401@tmr.com>
I didn't get a comment on my suggestion for a quick and dirty fix for
-assume-clean issues...
Bill Davidsen wrote:
> Neil Brown wrote:
>> On Thursday June 14, david@lang.hm wrote:
>>
>>> it's now churning away 'rebuilding' the brand new array.
>>>
>>> a few questions/thoughts.
>>>
>>> why does it need to do a rebuild when makeing a new array? couldn't
>>> it just zero all the drives instead? (or better still just record
>>> most of the space as 'unused' and initialize it as it starts useing
>>> it?)
>>>
>>
>> Yes, it could zero all the drives first. But that would take the same
>> length of time (unless p/q generation was very very slow), and you
>> wouldn't be able to start writing data until it had finished.
>> You can "dd" /dev/zero onto all drives and then create the array with
>> --assume-clean if you want to. You could even write a shell script to
>> do it for you.
>>
>> Yes, you could record which space is used vs unused, but I really
>> don't think the complexity is worth it.
>>
>>
> How about a simple solution which would get an array on line and still
> be safe? All it would take is a flag which forced reconstruct writes
> for RAID-5. You could set it with an option, or automatically if
> someone puts --assume-clean with --create, leave it in the superblock
> until the first "repair" runs to completion. And for repair you could
> make some assumptions about bad parity not being caused by error but
> just unwritten.
>
> Thought 2: I think the unwritten bit is easier than you think, you
> only need it on parity blocks for RAID5, not on data blocks. When a
> write is done, if the bit is set do a reconstruct, write the parity
> block, and clear the bit. Keeping a bit per data block is madness, and
> appears to be unnecessary as well.
>>> while I consider zfs to be ~80% hype, one advantage it could have
>>> (but I don't know if it has) is that since the filesystem an raid
>>> are integrated into one layer they can optimize the case where files
>>> are being written onto unallocated space and instead of reading
>>> blocks from disk to calculate the parity they could just put zeros
>>> in the unallocated space, potentially speeding up the system by
>>> reducing the amount of disk I/O.
>>>
>>
>> Certainly. But the raid doesn't need to be tightly integrated
>> into the filesystem to achieve this. The filesystem need only know
>> the geometry of the RAID and when it comes to write, it tries to write
>> full stripes at a time. If that means writing some extra blocks full
>> of zeros, it can try to do that. This would require a little bit
>> better communication between filesystem and raid, but not much. If
>> anyone has a filesystem that they want to be able to talk to raid
>> better, they need only ask...
>>
>>
>>> is there any way that linux would be able to do this sort of thing?
>>> or is it impossible due to the layering preventing the nessasary
>>> knowledge from being in the right place?
>>>
>>
>> Linux can do anything we want it to. Interfaces can be changed. All
>> it takes is a fairly well defined requirement, and the will to make it
>> happen (and some technical expertise, and lots of time .... and
>> coffee?).
>>
> Well, I gave you two thoughts, one which would be slow until a repair
> but sounds easy to do, and one which is slightly harder but works
> better and minimizes performance impact.
>
--
bill davidsen <davidsen@tmr.com>
CTO TMR Associates, Inc
Doing interesting things with small computers since 1979
next prev parent reply other threads:[~2007-06-21 23:03 UTC|newest]
Thread overview: 69+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-06-15 2:58 limits on raid david
2007-06-15 3:05 ` Neil Brown
2007-06-15 3:43 ` david
2007-06-15 3:58 ` Neil Brown
2007-06-15 9:13 ` David Chinner
2007-06-15 22:21 ` Neil Brown
2007-06-15 11:10 ` Avi Kivity
2007-06-15 16:23 ` Jan Engelhardt
2007-06-15 17:20 ` Avi Kivity
2007-06-15 21:59 ` Neil Brown
2007-06-16 17:23 ` Avi Kivity
2007-06-17 13:00 ` Andi Kleen
2007-06-18 4:57 ` David Chinner
2007-06-21 2:56 ` Neil Brown
2007-06-21 6:39 ` David Chinner
2007-06-21 6:45 ` david
2007-06-21 8:59 ` David Greaves
2007-06-21 17:00 ` Mark Lord
2007-06-21 11:00 ` David Chinner
2007-06-21 12:40 ` Mattias Wadenstein
2007-06-21 14:40 ` Justin Piszcz
2007-06-21 16:48 ` david
2007-06-21 18:30 ` Martin K. Petersen
2007-06-21 20:08 ` Nix
2007-06-16 2:03 ` Wakko Warner
2007-06-16 3:47 ` Neil Brown
2007-06-16 4:40 ` Dan Merillat
2007-06-16 7:48 ` david
2007-06-16 13:38 ` David Greaves
2007-06-16 17:16 ` david
2007-06-17 17:16 ` Bill Davidsen
2007-06-18 17:20 ` Brendan Conoboy
2007-06-18 17:28 ` david
2007-06-18 18:03 ` Lennart Sorensen
2007-06-18 18:12 ` david
2007-06-18 18:33 ` Lennart Sorensen
2007-06-18 18:40 ` david
2007-06-18 19:11 ` Brendan Conoboy
2007-06-18 20:52 ` david
2007-06-18 21:46 ` Wakko Warner
2007-06-18 21:56 ` david
2007-06-18 22:00 ` Brendan Conoboy
2007-06-19 20:11 ` Lennart Sorensen
2007-06-19 20:51 ` david
2007-06-19 15:07 ` Phillip Susi
2007-06-19 19:28 ` david
2007-06-18 18:07 ` Brendan Conoboy
2007-06-18 18:16 ` david
2007-06-16 13:33 ` David Greaves
2007-06-17 1:44 ` dean gaudet
2007-06-21 3:01 ` Neil Brown
2007-06-21 8:49 ` David Greaves
2007-06-16 14:08 ` Wakko Warner
2007-06-17 1:47 ` dean gaudet
2007-06-17 13:28 ` Wakko Warner
2007-06-17 17:28 ` dean gaudet
2007-06-17 19:30 ` Wakko Warner
2007-06-17 19:54 ` dean gaudet
2007-06-17 20:46 ` david
2007-06-17 20:44 ` david
2007-06-17 17:14 ` Bill Davidsen
2007-06-21 23:03 ` Bill Davidsen [this message]
2007-06-22 2:24 ` Neil Brown
2007-06-22 8:10 ` David Greaves
2007-06-22 9:51 ` david
2007-06-22 12:39 ` David Greaves
2007-06-22 16:00 ` Bill Davidsen
2007-06-22 16:55 ` David Greaves
2007-06-22 18:41 ` david
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=467B03C1.50809@tmr.com \
--to=davidsen@tmr.com \
--cc=david@lang.hm \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-raid@vger.kernel.org \
--cc=neilb@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).