linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ric Wheeler <rwheeler@redhat.com>
To: Daniel Shiels <btrfs721@tofubar.com>
Cc: Michael Tokarev <mjt@tls.msk.ru>,
	Daniel Taylor <daniel.taylor@wdc.com>,
	Mike Fedyk <mfedyk@mikefedyk.com>,
	Daniel J Blueman <daniel.blueman@gmail.com>,
	Mat <jackdachef@gmail.com>, LKML <linux-kernel@vger.kernel.org>,
	linux-fsdevel@vger.kernel.org,
	Chris Mason <chris.mason@oracle.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	The development of BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: Btrfs: broken file system design (was Unbound(?) internal    fragmentation in Btrfs)
Date: Sat, 26 Jun 2010 09:47:51 -0400	[thread overview]
Message-ID: <4C260507.4020409@redhat.com> (raw)
In-Reply-To: <57784.2001:5c0:82dc::2.1277555665.squirrel@www.tofubar.com>

On 06/26/2010 08:34 AM, Daniel Shiels wrote:
>> 25.06.2010 22:58, Ric Wheeler wrote:
>>      
>>> On 06/24/2010 06:06 PM, Daniel Taylor wrote:
>>>        
>> []
>>      
>>>>> On Wed, Jun 23, 2010 at 8:43 PM, Daniel Taylor
>>>>> <Daniel.Taylor@wdc.com>   wrote:
>>>>>
>>>>>            
>>>>>> Just an FYI reminder.  The original test (2K files) is utterly
>>>>>> pathological for disk drives with 4K physical sectors, such as
>>>>>> those now shipping from WD, Seagate, and others.  Some of the
>>>>>> SSDs have larger (16K0 or smaller blocks (2K).  There is also
>>>>>> the issue of btrfs over RAID (which I know is not entirely
>>>>>> sensible, but which will happen).
>>>>>>              
>> Why it is not sensible to use btrfs on raid devices?
>> Nowadays raid is just everywhere, from 'fakeraid' on AHCI to
>> large external arrays on iSCSI-attached storage.  Sometimes
>> it is nearly imposisble to _not_ use RAID, -- many servers
>> comes with a built-in RAID card which can't be turned off or
>> disabled.  And hardware raid is faster (at least in theory)
>> at least because it puts less load on various system busses.
>>
>> To many "enterprise folks" a statement "we don't need hw raid,
>> we have better solution" sounds like "we're just a toy, don't
>> use".
>>
>> Hmm?  ;)
>>
>> /mjt, who always used and preferred _software_ raid due to
>>   multiple reasons, and never used btrfs so far.
>>      
> Its not that you shouldn't use it on raid it's just it looses some value
> from the file system.
>
> Two nice features that btrfs provides are checksums and mirroring. If a
> disk corrupts a block then btrfs will realize due to the strong checksum
> and use the mirrored block. If you are using a raid system the raid won't
> know the data is corrupted and raid doesn't provide a way for the file
> system to get to the redundant block.
>
> I read a paper from Sun a while back about the undetected read failure
> rates for modern disks having not changed for many years. Disks are so
> large now that undetected failures are unacceptably likely for many
> systems. Hence zfs doing similar in file system raid schemes.
>
> In my lab I used dd to clobber data in some of my mirrors. Btrfs logs lots
> of checksum errors but never corrupted a file. Doing the same on a classic
> raid with classic filesystem (solaris with veritas volume manager)
> silently gave me bad data depending on what disk it felt like reading
> from.
>
> Daniel.
>    

I was (one of many) people who worked at EMC on designing storage 
arrays. If you are using any high end, external hardware array, it will 
detect data corruption pro-actively for you. Most arrays do continual 
scans for latent errors and have internal data integrity checks that are 
used for this.

Note that DIF/DIX adds an extra 8 bytes of data integrity to newer 
standards disks. We don't do anything with that today in btrfs, but you 
could imagine ways to get even better data integrity protection.

If you are using software RAID (MD), you should also use its internal 
checks to do this kind of proactive detection of latent errors on a 
regular basis (say once every week or two).

Ric

  parent reply	other threads:[~2010-06-26 13:47 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-06-03 14:58 Unbound(?) internal fragmentation in Btrfs Edward Shishkin
     [not found] ` <AANLkTilKw2onQkdNlZjg7WVnPu2dsNpDSvoxrO_FA2z_@mail.gmail.com>
2010-06-18  8:03   ` Christian Stroetmann
2010-06-18 13:32   ` Btrfs: broken file system design (was Unbound(?) internal fragmentation in Btrfs) Edward Shishkin
2010-06-18 13:45     ` Daniel J Blueman
2010-06-18 16:50       ` Edward Shishkin
2010-06-23 23:40         ` Jamie Lokier
2010-06-24  3:43           ` Daniel Taylor
2010-06-24  4:51             ` Mike Fedyk
2010-06-24 22:06               ` Daniel Taylor
2010-06-25  9:15                 ` Btrfs: broken file system design Andi Kleen
2010-06-25 18:58                 ` Btrfs: broken file system design (was Unbound(?) internal fragmentation in Btrfs) Ric Wheeler
2010-06-26  5:18                   ` Michael Tokarev
2010-06-26 11:55                     ` Ric Wheeler
     [not found]                     ` <57784.2001:5c0:82dc::2.1277555665.squirrel@www.tofubar.com>
2010-06-26 13:47                       ` Ric Wheeler [this message]
2010-06-24  9:50             ` David Woodhouse
2010-06-18 18:15       ` Christian Stroetmann
2010-06-18 13:47     ` Chris Mason
2010-06-18 15:05       ` Edward Shishkin
     [not found]       ` <4C1B8B4A.9060308@gmail.com>
2010-06-18 15:10         ` Chris Mason
2010-06-18 16:22           ` Edward Shishkin
     [not found]           ` <4C1B9D4F.6010008@gmail.com>
2010-06-18 18:10             ` Chris Mason
2010-06-18 15:21       ` Christian Stroetmann
2010-06-18 15:22         ` Chris Mason
2010-06-18 15:56     ` Jamie Lokier
2010-06-18 19:25       ` Christian Stroetmann
2010-06-18 19:29       ` Edward Shishkin
2010-06-18 19:35         ` Chris Mason
2010-06-18 22:04           ` Balancing leaves when walking from top to down (was Btrfs:...) Edward Shishkin
     [not found]           ` <4C1BED56.9010300@redhat.com>
2010-06-18 22:16             ` Ric Wheeler
2010-06-19  0:03               ` Edward Shishkin
2010-06-21 13:15             ` Chris Mason
     [not found]               ` <20100621180013.GD17979@think>
2010-06-22 14:12                 ` Edward Shishkin
2010-06-22 14:20                   ` Chris Mason
2010-06-23 13:46                     ` Edward Shishkin
     [not found]                     ` <4C221049.501@gmail.com>
2010-06-23 23:37                       ` Jamie Lokier
2010-06-24 13:06                         ` Chris Mason
2010-06-30 20:05                           ` Edward Shishkin
     [not found]                           ` <4C2BA381.7040808@redhat.com>
2010-06-30 21:12                             ` Chris Mason
2010-07-09  4:16                 ` Chris Samuel
2010-07-09 20:30                   ` Chris Mason
2010-06-23 23:57         ` Btrfs: broken file system design (was Unbound(?) internal fragmentation in Btrfs) Jamie Lokier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4C260507.4020409@redhat.com \
    --to=rwheeler@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=btrfs721@tofubar.com \
    --cc=chris.mason@oracle.com \
    --cc=daniel.blueman@gmail.com \
    --cc=daniel.taylor@wdc.com \
    --cc=jackdachef@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mfedyk@mikefedyk.com \
    --cc=mjt@tls.msk.ru \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).