linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Bill Davidsen <davidsen@tmr.com>
To: Bill Davidsen <davidsen@tmr.com>
Cc: Neil Brown <neilb@suse.de>,
	david@lang.hm, linux-kernel@vger.kernel.org,
	linux-raid@vger.kernel.org
Subject: Re: limits on raid
Date: Thu, 21 Jun 2007 19:03:29 -0400	[thread overview]
Message-ID: <467B03C1.50809@tmr.com> (raw)
In-Reply-To: <46756BE2.7010401@tmr.com>

I didn't get a comment on my suggestion for a quick and dirty fix for 
-assume-clean issues...

Bill Davidsen wrote:
> Neil Brown wrote:
>> On Thursday June 14, david@lang.hm wrote:
>>  
>>> it's now churning away 'rebuilding' the brand new array.
>>>
>>> a few questions/thoughts.
>>>
>>> why does it need to do a rebuild when makeing a new array? couldn't 
>>> it just zero all the drives instead? (or better still just record 
>>> most of the space as 'unused' and initialize it as it starts useing 
>>> it?)
>>>     
>>
>> Yes, it could zero all the drives first.  But that would take the same
>> length of time (unless p/q generation was very very slow), and you
>> wouldn't be able to start writing data until it had finished.
>> You can "dd" /dev/zero onto all drives and then create the array with
>> --assume-clean if you want to.  You could even write a shell script to
>> do it for you.
>>
>> Yes, you could record which space is used vs unused, but I really
>> don't think the complexity is worth it.
>>
>>   
> How about a simple solution which would get an array on line and still 
> be safe? All it would take is a flag which forced reconstruct writes 
> for RAID-5. You could set it with an option, or automatically if 
> someone puts --assume-clean with --create, leave it in the superblock 
> until the first "repair" runs to completion. And for repair you could 
> make some assumptions about bad parity not being caused by error but 
> just unwritten.
>
> Thought 2: I think the unwritten bit is easier than you think, you 
> only need it on parity blocks for RAID5, not on data blocks. When a 
> write is done, if the bit is set do a reconstruct, write the parity 
> block, and clear the bit. Keeping a bit per data block is madness, and 
> appears to be unnecessary as well.
>>> while I consider zfs to be ~80% hype, one advantage it could have 
>>> (but I don't know if it has) is that since the filesystem an raid 
>>> are integrated into one layer they can optimize the case where files 
>>> are being written onto unallocated space and instead of reading 
>>> blocks from disk to calculate the parity they could just put zeros 
>>> in the unallocated space, potentially speeding up the system by 
>>> reducing the amount of disk I/O.
>>>     
>>
>> Certainly.  But the raid doesn't need to be tightly integrated
>> into the filesystem to achieve this.  The filesystem need only know
>> the geometry of the RAID and when it comes to write, it tries to write
>> full stripes at a time.  If that means writing some extra blocks full
>> of zeros, it can try to do that.  This would require a little bit
>> better communication between filesystem and raid, but not much.  If
>> anyone has a filesystem that they want to be able to talk to raid
>> better, they need only ask...
>>  
>>  
>>> is there any way that linux would be able to do this sort of thing? 
>>> or is it impossible due to the layering preventing the nessasary 
>>> knowledge from being in the right place?
>>>     
>>
>> Linux can do anything we want it to.  Interfaces can be changed.  All
>> it takes is a fairly well defined requirement, and the will to make it
>> happen (and some technical expertise, and lots of time .... and
>> coffee?).
>>   
> Well, I gave you two thoughts, one which would be slow until a repair 
> but sounds easy to do, and one which is slightly harder but works 
> better and minimizes performance impact.
>


-- 
bill davidsen <davidsen@tmr.com>
  CTO TMR Associates, Inc
  Doing interesting things with small computers since 1979


  reply	other threads:[~2007-06-21 23:03 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-06-15  2:58 limits on raid david
2007-06-15  3:05 ` Neil Brown
2007-06-15  3:43   ` david
2007-06-15  3:58     ` Neil Brown
2007-06-15  9:13       ` David Chinner
2007-06-15 22:21         ` Neil Brown
2007-06-15 11:10       ` Avi Kivity
2007-06-15 16:23         ` Jan Engelhardt
2007-06-15 17:20           ` Avi Kivity
2007-06-15 21:59         ` Neil Brown
2007-06-16 17:23           ` Avi Kivity
2007-06-17 13:00           ` Andi Kleen
2007-06-18  4:57           ` David Chinner
2007-06-21  2:56             ` Neil Brown
2007-06-21  6:39               ` David Chinner
2007-06-21  6:45                 ` david
2007-06-21  8:59                   ` David Greaves
2007-06-21 17:00                   ` Mark Lord
2007-06-21 11:00                 ` David Chinner
2007-06-21 12:40               ` Mattias Wadenstein
2007-06-21 14:40                 ` Justin Piszcz
2007-06-21 16:48                 ` david
2007-06-21 18:30                 ` Martin K. Petersen
2007-06-21 20:08               ` Nix
2007-06-16  2:03       ` Wakko Warner
2007-06-16  3:47         ` Neil Brown
2007-06-16  4:40           ` Dan Merillat
2007-06-16  7:48           ` david
2007-06-16 13:38             ` David Greaves
2007-06-16 17:16               ` david
2007-06-17 17:16             ` Bill Davidsen
2007-06-18 17:20             ` Brendan Conoboy
2007-06-18 17:28               ` david
2007-06-18 18:03                 ` Lennart Sorensen
2007-06-18 18:12                   ` david
2007-06-18 18:33                     ` Lennart Sorensen
2007-06-18 18:40                       ` david
2007-06-18 19:11                         ` Brendan Conoboy
2007-06-18 20:52                           ` david
2007-06-18 21:46                             ` Wakko Warner
2007-06-18 21:56                               ` david
2007-06-18 22:00                                 ` Brendan Conoboy
2007-06-19 20:11                                 ` Lennart Sorensen
2007-06-19 20:51                                   ` david
2007-06-19 15:07                             ` Phillip Susi
2007-06-19 19:28                               ` david
2007-06-18 18:07                 ` Brendan Conoboy
2007-06-18 18:16                   ` david
2007-06-16 13:33           ` David Greaves
2007-06-17  1:44             ` dean gaudet
2007-06-21  3:01             ` Neil Brown
2007-06-21  8:49               ` David Greaves
2007-06-16 14:08           ` Wakko Warner
2007-06-17  1:47             ` dean gaudet
2007-06-17 13:28               ` Wakko Warner
2007-06-17 17:28                 ` dean gaudet
2007-06-17 19:30                   ` Wakko Warner
2007-06-17 19:54                     ` dean gaudet
2007-06-17 20:46                       ` david
2007-06-17 20:44                     ` david
2007-06-17 17:14       ` Bill Davidsen
2007-06-21 23:03         ` Bill Davidsen [this message]
2007-06-22  2:24           ` Neil Brown
2007-06-22  8:10             ` David Greaves
2007-06-22  9:51               ` david
2007-06-22 12:39                 ` David Greaves
2007-06-22 16:00                   ` Bill Davidsen
2007-06-22 16:55                     ` David Greaves
2007-06-22 18:41                     ` david

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=467B03C1.50809@tmr.com \
    --to=davidsen@tmr.com \
    --cc=david@lang.hm \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).