All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nagilum <nagilum@nagilum.org>
To: linux-raid@vger.kernel.org
Cc: Neil Brown <neilb@suse.de>
Subject: Re: [PATCH 00/18] Assorted md patches headed for 2.6.30
Date: Mon, 16 Feb 2009 18:31:40 +0100	[thread overview]
Message-ID: <20090216183140.15250l3o3o2hk8w0@cakebox.home> (raw)
In-Reply-To: <18840.64312.247580.780346@notabene.brown>


----- Message from neilb@suse.de ---------
     Date: Mon, 16 Feb 2009 16:35:52 +1100
     From: Neil Brown <neilb@suse.de>
  Subject: Re: [PATCH 00/18] Assorted md patches headed for 2.6.30
       To: Bill Davidsen <davidsen@tmr.com>
       Cc: Julian Cowley <julian@lava.net>, Keld Jorn Simonsen  
<keld@dkuug.dk>, linux-raid@vger.kernel.org

>> Ob. plug for raid5E: the advantages of raid5E are two-fold. The most
>> obvious is that head motion is spread over N+2 drives (N being number of
>> data drives) which improves performance quite a bit in the common small
>> business case of 4-5 drive setups. It also puts some use on each drive,
>> so you don't suddenly start using a drive which may have been spun down
>> for a month, may have developed issues since SMART was last run, etc.
>>
>
> Are you thinking of raid5e, where all the spare space is at the end of
> the devices, or raid5ee where it is more evenly distributed?

raid5E I'd say.

> So raid5e is just a normal raid5 where you don't use all of the space.
> When a failure happens, you reshape to n-1 drives, thus absorbing the
> space.
>
> raid5ee is much like raid6, but you don't read or write the Q block.
> If you lose a drive, you rebuild it in the space were the Q block
> lives.
>
> So would you just use raid6 normally and transition to a contorted
> raid5 on device failure?  Or would you really want to leave those
> blocks fallow?

My understanding is that 5EE leaves those blocks empty. Doing real Q  
blocks would entail too much overhead but it reminds of an idea I had  
some time ago. I call it lazy-Raid6 ;)

Problem: You have enough disks to run RAID6 but you don't want to pay  
the performance penalty* of RAID6.
The solution in those cases is usually RAID5+hotspare but maybe we can  
do better.
We could also use the hotspare to store the RAID6 polynom but we have  
to calculate this (or more specifically read/write the stripe/block)  
only when the disks are idle. This of course means that the hotspare  
will have a number of invalid blocks after each write operation but  
the majority of blocks will be up-to-date. (use a bitmap to mark dirty  
blocks and "clean up" when the disks are idle)
The goal behind this is to have basically the same performance as with  
normal RAID5 but a higher failure resilience. In my experience  
harddisks often fail partially so that if you have a partial and a  
complete disk failure, chances are you will be able to recover. Even  
when two disks fail completely the number of dirty blocks should  
usually be pretty low so we would be able recover most of the data.
If there is a single disk failure we behave like a normal  
raid5+(hot)spare of course.
It is not intended as a replacement for normal RAID6 but it would give  
most of your data about the same protection while maintaining the  
speed of RAID5.

*) The main speed advantage of RAID5 vs. RAID6 comes from the fact  
that if you write one physical block**) in a RAID5 you only need to  
update***) one other additional physical block. If you write a  
physical block in a RAID6 you have to read the whole stripe and then  
write the RAID6 chunk of the stripe.
**) A RAID chunk consists of several physical blocks. Several chunks  
make up a stripe.
***) read+write

Ok, I hope no one can claim a patent on it now. ;)
Alex.

========================================================================
#    _  __          _ __     http://www.nagilum.org/ \n icq://69646724 #
#   / |/ /__ ____ _(_) /_ ____ _  nagilum@nagilum.org \n +491776461165 #
#  /    / _ `/ _ `/ / / // /  ' \  Amiga (68k/PPC): AOS/NetBSD/Linux   #
# /_/|_/\_,_/\_, /_/_/\_,_/_/_/_/   Mac (PPC): MacOS-X / NetBSD /Linux #
#           /___/     x86: FreeBSD/Linux/Solaris/Win2k  ARM9: EPOC EV6 #
========================================================================


----------------------------------------------------------------
cakebox.homeunix.net - all the machine one needs..

  reply	other threads:[~2009-02-16 17:31 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-02-12  3:10 [PATCH 00/18] Assorted md patches headed for 2.6.30 NeilBrown
2009-02-12  3:10 ` [PATCH 04/18] md: be more consistent about setting WriteMostly flag when adding a drive to an array NeilBrown
2009-02-12  3:10 ` [PATCH 05/18] md: Make mddev->size sector-based NeilBrown
2009-02-12  3:10 ` [PATCH 06/18] md: Represent raid device size in sectors NeilBrown
2009-02-12  3:10 ` [PATCH 07/18] md/raid5: simplify interface for init_stripe and get_active_stripe NeilBrown
2009-02-12  3:10 ` [PATCH 02/18] md: write bitmap information to devices that are undergoing recovery NeilBrown
2009-02-12  3:10 ` [PATCH 03/18] md: occasionally checkpoint drive recovery to reduce duplicate effort after a crash NeilBrown
2009-02-12 17:26   ` John Stoffel
2009-02-13 16:20   ` Bill Davidsen
2009-02-13 16:34     ` Jon Nelson
2009-02-12  3:10 ` [PATCH 08/18] md/raid5: change raid5_compute_sector and stripe_to_pdidx to take a 'previous' argument NeilBrown
2009-02-12  3:10 ` [PATCH 01/18] md: never clear bit from the write-intent bitmap when the array is degraded NeilBrown
2009-02-12  3:10 ` [PATCH 09/18] md/raid6: remove expectation that Q device is immediately after P device NeilBrown
2009-02-12 16:56   ` Andre Noll
2009-02-13 22:19     ` Dan Williams
2009-02-16  0:08     ` Neil Brown
2009-02-13 16:37   ` Bill Davidsen
2009-02-16  5:15     ` Neil Brown
2009-02-12  3:10 ` [PATCH 17/18] md: add ->takeover method for raid5 to be able to take over raid1 NeilBrown
2009-02-12  3:10 ` [PATCH 13/18] md/raid5: refactor raid5 "run" NeilBrown
2009-02-12  3:10 ` [PATCH 11/18] md/raid5: Add support for new layouts for raid5 and raid6 NeilBrown
2009-02-12  3:10 ` [PATCH 14/18] md: md_unregister_thread should cope with being passed NULL NeilBrown
2009-02-12  3:10 ` [PATCH 12/18] md/raid5: finish support for DDF/raid6 NeilBrown
2009-02-12  3:10 ` [PATCH 10/18] md/raid5: simplify raid5_compute_sector interface NeilBrown
2009-02-12  3:10 ` [PATCH 16/18] md: add ->takeover method to support changing the personality managing an array NeilBrown
2009-02-12  3:10 ` [PATCH 18/18] md/raid5: allow layout/chunksize to be changed on an active2-drive raid5 NeilBrown
2009-02-12  3:10 ` [PATCH 15/18] md: hopefully enable suspend/resume of md devices NeilBrown
2009-02-12  8:11 ` [PATCH 00/18] Assorted md patches headed for 2.6.30 Keld Jørn Simonsen
2009-02-12  9:13   ` Steve Fairbairn
2009-02-12  9:46     ` Keld Jørn Simonsen
2009-02-12 10:52       ` NeilBrown
2009-02-12 11:16         ` Keld Jørn Simonsen
2009-02-12 10:53       ` Julian Cowley
2009-02-13 16:54         ` Bill Davidsen
2009-02-16  5:35           ` Neil Brown
2009-02-16 17:31             ` Nagilum [this message]
2009-02-12 22:57     ` Dan Williams
2009-02-13 16:56     ` Bill Davidsen
2009-02-12  9:21   ` NeilBrown
2009-02-12  9:53     ` Keld Jørn Simonsen
2009-02-12 10:45       ` NeilBrown
2009-02-12 11:11         ` Keld Jørn Simonsen
2009-02-12 15:28         ` Wil Reichert
2009-02-12 17:44           ` Keld Jørn Simonsen
2009-02-12  9:42 ` Farkas Levente
2009-02-12 10:40   ` NeilBrown
2009-02-12 11:17     ` Farkas Levente
2009-02-13 17:02       ` Bill Davidsen
2009-03-10  8:24 jzc-sina
     [not found] <7554605.886551236670855947.JavaMail.coremail@bj163app40.163.com>
2009-03-13  1:00 ` Neil Brown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090216183140.15250l3o3o2hk8w0@cakebox.home \
    --to=nagilum@nagilum.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.