All of lore.kernel.org
 help / color / mirror / Atom feed
* btrfs RAID with enterprise SATA or SAS drives
@ 2012-05-09 22:01 Daniel Pocock
  2012-05-10 19:58 ` Hubert Kario
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Daniel Pocock @ 2012-05-09 22:01 UTC (permalink / raw)
  To: linux-btrfs



There is various information about
- enterprise-class drives (either SAS or just enterprise SATA)
- the SCSI/SAS protocols themselves vs SATA
having more advanced features (e.g. for dealing with error conditions)
than the average block device

For example, Adaptec recommends that such drives will work better with
their hardware RAID cards:

http://ask.adaptec.com/cgi-bin/adaptec_tic.cfg/php/enduser/std_adp.php?p_faqid=14596
"Desktop class disk drives have an error recovery feature that will
result in a continuous retry of the drive (read or write) when an error
is encountered, such as a bad sector. In a RAID array this can cause the
RAID controller to time-out while waiting for the drive to respond."

and this blog:
http://www.adaptec.com/blog/?p=901
"major advantages to enterprise drives (TLER for one) ... opt for the
enterprise drives in a RAID environment no matter what the cost of the
drive over the desktop drive"

My question..

- does btrfs RAID1 actively use the more advanced features of these
drives, e.g. to work around errors without getting stuck on a bad block?

- if a non-RAID SAS card is used, does it matter which card is chosen?
Does btrfs work equally well with all of them?

- ignoring the better MTBF and seek times of these drives, do any of the
other features passively contribute to a better RAID experience when
using btrfs?

- for someone using SAS or enterprise SATA drives with Linux, I
understand btrfs gives the extra benefit of checksums, are there any
other specific benefits over using mdadm or dmraid?


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: btrfs RAID with enterprise SATA or SAS drives
  2012-05-09 22:01 btrfs RAID with enterprise SATA or SAS drives Daniel Pocock
@ 2012-05-10 19:58 ` Hubert Kario
  2012-05-18 16:19   ` btrfs RAID with RAID cards (thread renamed) Daniel Pocock
  2012-05-11  2:18 ` btrfs RAID with enterprise SATA or SAS drives Duncan
  2014-07-09 14:48 ` Martin Steigerwald
  2 siblings, 1 reply; 10+ messages in thread
From: Hubert Kario @ 2012-05-10 19:58 UTC (permalink / raw)
  To: Daniel Pocock; +Cc: linux-btrfs

On Wednesday 09 of May 2012 22:01:49 Daniel Pocock wrote:
> There is various information about
> - enterprise-class drives (either SAS or just enterprise SATA)
> - the SCSI/SAS protocols themselves vs SATA
> having more advanced features (e.g. for dealing with error conditions=
)
> than the average block device
>=20
> For example, Adaptec recommends that such drives will work better wit=
h
> their hardware RAID cards:
>=20
> http://ask.adaptec.com/cgi-bin/adaptec_tic.cfg/php/enduser/std_adp.ph=
p?p_f
> aqid=3D14596 "Desktop class disk drives have an error recovery featur=
e that
> will result in a continuous retry of the drive (read or write) when a=
n
> error is encountered, such as a bad sector. In a RAID array this can
> cause the RAID controller to time-out while waiting for the drive to
> respond."
>=20
> and this blog:
> http://www.adaptec.com/blog/?p=3D901
> "major advantages to enterprise drives (TLER for one) ... opt for the
> enterprise drives in a RAID environment no matter what the cost of th=
e
> drive over the desktop drive"
>=20
> My question..
>=20
> - does btrfs RAID1 actively use the more advanced features of these
> drives, e.g. to work around errors without getting stuck on a bad blo=
ck?

There are no (short) timeouts that I know of
=20
> - if a non-RAID SAS card is used, does it matter which card is chosen=
?
> Does btrfs work equally well with all of them?

If you're using btrfs RAID, you need a HBA, not a RAID card. If the RAI=
D=20
card can work as a HBA (usually labelled as JBOD mode) then you're good=
 to=20
go.

=46or example, HP CCISS controllers can't work in JBOD mode.

If you're using the RAID feature of the card, then you need to look at=20
general Linux support, btrfs doesn't do anything other FS don't do with=
 the=20
block devices.
=20
> - ignoring the better MTBF and seek times of these drives, do any of =
the
> other features passively contribute to a better RAID experience when
> using btrfs?

whatever they really have high MTBF values is debatable...

seek times do matter very much to btrfs, fast CPU is also a good thing =
to=20
have with btrfs, especially if you want to use data compression, high n=
ode=20
or leaf sizes

> - for someone using SAS or enterprise SATA drives with Linux, I
> understand btrfs gives the extra benefit of checksums, are there any
> other specific benefits over using mdadm or dmraid?

Because btrfs knows when the drive is misbeheaving (because of checksum=
s)=20
and is returning bad data, it can detect problems much faster then RAID=
=20
(which doesn't use the reduncancy for checking if the data it's returni=
ng is=20
actually correct). Both hardware and software RAID implementations depe=
nd on=20
the drives to return IO errors. In effect, the data is safer on btrfs t=
han=20
regular RAID.

Besides that online resize (both shrinking and extending) and (currentl=
y not=20
implemented) ability to set redundancy level on a per file basis.
In other words, with btrfs you can have a file with RAID6 redundancy an=
d a=20
second one with RAID10 level of redundancy in single directory.

Regards,
--=20
Hubert Kario
QBS - Quality Business Software
02-656 Warszawa, ul. Ksawer=F3w 30/85
tel. +48 (22) 646-61-51, 646-74-24
www.qbs.com.pl
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: btrfs RAID with enterprise SATA or SAS drives
  2012-05-09 22:01 btrfs RAID with enterprise SATA or SAS drives Daniel Pocock
  2012-05-10 19:58 ` Hubert Kario
@ 2012-05-11  2:18 ` Duncan
  2012-05-11 16:58   ` Martin Steigerwald
  2014-07-09 14:48 ` Martin Steigerwald
  2 siblings, 1 reply; 10+ messages in thread
From: Duncan @ 2012-05-11  2:18 UTC (permalink / raw)
  To: linux-btrfs

Daniel Pocock posted on Wed, 09 May 2012 22:01:49 +0000 as excerpted:

> There is various information about
> - enterprise-class drives (either SAS or just enterprise SATA)
> - the SCSI/SAS protocols themselves vs SATA having more advanced
> features (e.g. for dealing with error conditions)
> than the average block device

This isn't a direct answer to that, but expressing a bit of concern over 
the implications of your question, that you're planning on using btrfs in 
an enterprise class installation.

While various Enterprise Linux distributions do now officially "support" 
btrfs, it's worth checking out exactly what that means in practice.

Meanwhile, in mainline Linux kernel terms, btrfs remains very much an 
experimental filesystem, as expressed by the kernel config option that 
turns btrfs on.  It's still under very intensive development, with an 
error-fixing btrfsck only recently available and still coming with its 
own "may make the problems worse instead of fixing them" warning.  
Testers willing to risk the chance of data loss implied by that 
"experimental filesystem" label should be running the latest stable 
kernel at the oldest, and preferably the rcs by rc5 or so, as new kernels 
continue to fix problems in older btrfs code as well as introduce new 
features and if you're running an older kernel, that means you're running 
a kernel with known problems that are fixed in the latest kernel.

Experimental also has implications in terms of backups.  A good sysadmin 
always has backups, but normally, the working copy can be considered the 
primary copy, and there's backups of that.  On an experimental filesystem 
under as intense continued development as btrfs, by contrast, it's best 
to consider your btrfs copy an extra "throwaway" copy only intended for 
testing.  You still have your primary copy, along with all the usual 
backups, on something less experimental, since you never know when/where/
how your btrfs testing will screw up its copy.

That's not normally the kind of filesystem "enterprise class" users are 
looking for, unless of course they're doing longer term testing, with an 
intent to actually deploy perhaps a year out, if the testing proves it 
robust enough by then.

And while it's still experimental ATM, btrfs /is/ fast improving.  It 
/does/ now have a working fsck, even if it still comes with warnings, 
and  reasonable feature-set build-out should be within a few more kernels 
(raid5/6 mode is roadmapped for 3.5, and n-way-mirroring raid1/10 are 
roadmapped after that, current "raid1" mode is only 2-way mirroring, 
regardless of the number of drives).  After that, the focus should turn 
toward full stabilization.  So while btrfs is currently intended for 
testers only, by around the end of the year or early next, it will likely 
be reasonably stable and ready for at least the more adventurous 
conventional users.  Still, enterprise class users tend to be a 
conservative bunch, and I'd be surprised if they really consider btrfs 
ready before mid-year next year, at the earliest.

So if you're looking to test btrfs on enterprise-class hardware, great!  
But do be aware of what you're getting into.  If you have an enterprise 
distro which supports it too, even greater, but know what that actually 
means.  Does it mean they support the same level of 9s uptime on it as 
they normally do, or just that they're ready to accept payment to try and 
recover things if something goes wrong?

If that hasn't scared you off, and you've not read the wiki yet, that's 
probably the next thing you should look at, as it answers a lot of 
questions you may have, as well as some you wouldn't think to ask.  Being 
a wiki, of course, your own contributions are welcome.  In particular, 
you may well be able to cover some of the enterprise class viewpoint 
questions your asking based on your own testing, once you get to that 
point.

https://btrfs.wiki.kernel.org/

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: btrfs RAID with enterprise SATA or SAS drives
  2012-05-11  2:18 ` btrfs RAID with enterprise SATA or SAS drives Duncan
@ 2012-05-11 16:58   ` Martin Steigerwald
  2012-05-14  8:38     ` Duncan
  0 siblings, 1 reply; 10+ messages in thread
From: Martin Steigerwald @ 2012-05-11 16:58 UTC (permalink / raw)
  To: linux-btrfs

Am Freitag, 11. Mai 2012 schrieb Duncan:
> Daniel Pocock posted on Wed, 09 May 2012 22:01:49 +0000 as excerpted:
> > There is various information about
> > - enterprise-class drives (either SAS or just enterprise SATA)
> > - the SCSI/SAS protocols themselves vs SATA having more advanced
> > features (e.g. for dealing with error conditions)
> > than the average block device
>=20
> This isn't a direct answer to that, but expressing a bit of concern
> over  the implications of your question, that you're planning on usin=
g
> btrfs in an enterprise class installation.
>=20
> While various Enterprise Linux distributions do now officially
> "support"  btrfs, it's worth checking out exactly what that means in
> practice.
>=20
> Meanwhile, in mainline Linux kernel terms, btrfs remains very much an=
=20
> experimental filesystem, as expressed by the kernel config option tha=
t=20
> turns btrfs on.  It's still under very intensive development, with an=
=20
> error-fixing btrfsck only recently available and still coming with it=
s=20
> own "may make the problems worse instead of fixing them" warning. =20
> Testers willing to risk the chance of data loss implied by that=20
> "experimental filesystem" label should be running the latest stable=20
> kernel at the oldest, and preferably the rcs by rc5 or so, as new
> kernels  continue to fix problems in older btrfs code as well as
> introduce new features and if you're running an older kernel, that
> means you're running a kernel with known problems that are fixed in
> the latest kernel.
>=20
> Experimental also has implications in terms of backups.  A good
> sysadmin  always has backups, but normally, the working copy can be
> considered the primary copy, and there's backups of that.  On an
> experimental filesystem under as intense continued development as
> btrfs, by contrast, it's best to consider your btrfs copy an extra
> "throwaway" copy only intended for testing.  You still have your
> primary copy, along with all the usual backups, on something less
> experimental, since you never know when/where/ how your btrfs testing
> will screw up its copy.

Duncan, did you actually test BTRFS? Theory can=C2=B4t replace real lif=
e=20
experience.

=46rom all of my personal BTRFS installations not one has gone corrupt =
- and=20
I have at least four, while more of them are in use at my employer. Exc=
ept=20
maybe a scratch data BRTFS RAID 0 over lots of SATA disks. But maybe it=
=20
would have been fixable by btrfs-zero-log which I didn=C2=B4t know of b=
ack then.=20
Another one needed a btrfs-zero-log, but that was quite some time ago.

Some of the installations are in use for more than a year AFAIR.

While I would still be reluctant with deploying BTRFS for a customer fo=
r=20
critical data and I think Oracle=C2=B4s and SUSE=C2=B4s move to support=
 it officially=20
is a bit daring, I don=C2=B4t think BTRFS is in a "throwaway copy" stat=
e=20
anymore.

As usual regular backups are important=E2=80=A6

--=20
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: btrfs RAID with enterprise SATA or SAS drives
  2012-05-11 16:58   ` Martin Steigerwald
@ 2012-05-14  8:38     ` Duncan
  0 siblings, 0 replies; 10+ messages in thread
From: Duncan @ 2012-05-14  8:38 UTC (permalink / raw)
  To: linux-btrfs

Martin Steigerwald posted on Fri, 11 May 2012 18:58:05 +0200 as excerpt=
ed:

Martin Steigerwald posted on Fri, 11 May 2012 18:58:05 +0200 as excerpt=
ed:

> Am Freitag, 11. Mai 2012 schrieb Duncan:
>> Daniel Pocock posted on Wed, 09 May 2012 22:01:49 +0000 as excerpted=
:
>> > There is various information about - enterprise-class drives

>> This isn't a direct answer to that, but expressing a bit of concern
>> over  the implications of your question, that you're planning on usi=
ng
>> btrfs in an enterprise class installation.

>> [In] mainline Linux kernel terms, btrfs remains very much an
>> experimental filesystem

>> On an experimental filesystem under as intense continued development=
 as
>> btrfs, by contrast, it's best to consider your btrfs copy an extra
>> "throwaway" copy only intended for testing.  You still have your
>> primary copy, along with all the usual backups, on something less
>> experimental, since you never know when/where/ how your btrfs testin=
g
>> will screw up its copy.
>=20
> Duncan, did you actually test BTRFS? Theory can=C2=B4t replace real l=
ife
> experience.

I /had/ been waiting until the n-way-mirrored-raid1 roadmapped for afte=
r
raid5/6 mode (which should hit 3.5, I believe), but hardware issues
intervened and I'm no longer using those older 4-way md/raid drives as
primary.

And now that I have it, present personal experience does not contradict
what I posted.  btrfs does indeed work reasonably well under reasonably
good, non-stressful, conditions.  But my experience so far aligns quite
well with the "consider the btrfs copy a throw-away copy, just in case"
recommendation.  Just because it's a throw-away copy doesn't mean you'l=
l
have to have to resort to the "good" copy elsewhere, but it DOES hopefu=
lly
mean that you'll have both a "good" copy elsewhere, and a backup for th=
at
supposedly good copy, just in case btrfs does go bad,
and that supposedly good primary copy, ends up not being good after all=
=2E

> From all of my personal BTRFS installations not one has gone corrupt =
-
> and I have at least four, while more of them are in use at my employe=
r.
> Except maybe a scratch data BRTFS RAID 0 over lots of SATA disks. But
> maybe it would have been fixable by btrfs-zero-log which I didn=C2=B4=
t know
> of back then. Another one needed a btrfs-zero-log, but that was quite
> some time ago.
>=20
> Some of the installations are in use for more than a year AFAIR.
>=20
> While I would still be reluctant with deploying BTRFS for a customer =
for
> critical data

This was actually my point in this thread.  If someone's asking questio=
ns
about enterprise quality hardware, they're not likely to run into some =
of
the bugs I've been having recently that have been exposed by hardware
issues.  However, they're also far more likely to be considering btrfs =
for
a row-of-nines uptime application, which is, after all, where some of
btrfs' features are normally found.  Regardless of whether btrfs is pas=
t
the "throw away data experimental class" stage or not, I think we both
agree it isn't ready for row-of-nines-uptime applications just yet.  If
he's just testing btrfs on such equipment for possible future
row-of-nines-uptime deployment a year or possibly two out, great.  If h=
e's
looking at such a deployment two-months-out, no way, and it looks like =
you
agree.

--=20
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: btrfs RAID with RAID cards (thread renamed)
  2012-05-10 19:58 ` Hubert Kario
@ 2012-05-18 16:19   ` Daniel Pocock
  0 siblings, 0 replies; 10+ messages in thread
From: Daniel Pocock @ 2012-05-18 16:19 UTC (permalink / raw)
  To: Hubert Kario; +Cc: linux-btrfs


>> - if a non-RAID SAS card is used, does it matter which card is chosen?
>> Does btrfs work equally well with all of them?
> 
> If you're using btrfs RAID, you need a HBA, not a RAID card. If the RAID 
> card can work as a HBA (usually labelled as JBOD mode) then you're good to 
> go.
> 
> For example, HP CCISS controllers can't work in JBOD mode.

Would you know if they implement their own checksumming, similar to what
btrfs does?  Or if someone uses SmartArray (CCISS) RAID1, then they
simply don't get the full benefit of checksumming under any possible
configuration?

I've had a quick look at what is on the market, here are some observations:

- in many cases, IOPS (critical for SSDs) vary wildly: e.g.
  - SATA-3 SSDs advertise up to 85k IOPS, so RAID1 needs 170k IOPS
  - HP's standard HBAs don't support high IOPS
  - HP Gen8 SmartArray (e.g. P420) claims up to 200k IOPS
  - previous HP arrays (e.g. P212) support only 60k IOPS
  - many vendors don't advertise the IOPS prominently - I had to Google
the HP site to find those figures quoted in some PDFs, they don't quote
them in the quickspecs or product summary tables

- Adaptec now offers an SSD caching function in hardware, supposedly
drop it in the machine and all disks respond faster
  - how would this interact with btrfs checksumming?  E.g. I'm guessing
it would be necessary to ensure that data from both spindles is not
cached on the same SSD?
  - I started thinking about the possibility that data is degraded on
the mechanical disk but btrfs gets a good checksum read from the SSD and
remains blissfully unaware that the real disk is failing, then the other
disk goes completely offline one day, for whatever reason the data is
not in the SSD cache and the sector can't be read reliably from the
remaining physical disk - should such caching just be avoided or can it
be managed from btrfs itself in a manner that is foolproof?

How about the combination of btrfs/root/boot filesystems and grub?  Can
they all play nicely together?  This seems to be one compelling factor
with hardware RAID, the cards have a BIOS that can boot from any drive
even if the other is offline.





^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: btrfs RAID with enterprise SATA or SAS drives
  2012-05-09 22:01 btrfs RAID with enterprise SATA or SAS drives Daniel Pocock
  2012-05-10 19:58 ` Hubert Kario
  2012-05-11  2:18 ` btrfs RAID with enterprise SATA or SAS drives Duncan
@ 2014-07-09 14:48 ` Martin Steigerwald
  2014-07-10  2:10   ` Russell Coker
  2 siblings, 1 reply; 10+ messages in thread
From: Martin Steigerwald @ 2014-07-09 14:48 UTC (permalink / raw)
  To: Daniel Pocock; +Cc: linux-btrfs

Am Mittwoch, 9. Mai 2012, 22:01:49 schrieb Daniel Pocock:
> There is various information about
> - enterprise-class drives (either SAS or just enterprise SATA)
> - the SCSI/SAS protocols themselves vs SATA
> having more advanced features (e.g. for dealing with error conditions)
> than the average block device
> 
> For example, Adaptec recommends that such drives will work better with
> their hardware RAID cards:
[…]
> - for someone using SAS or enterprise SATA drives with Linux, I
> understand btrfs gives the extra benefit of checksums, are there any
> other specific benefits over using mdadm or dmraid?

I think I can answer this one.

Most important advantage I think is BTRFS is aware of which blocks of the 
RAID are in use and need to be synced:

- Instant initialization of RAID regardless of size (unless at some 
capacity mkfs.btrfs needs more time)

- Rebuild after disk failure or disk replace will only copy *used* blocks


Scrubbing can repair from good disk if RAID with redundancy, but SoftRAID 
should be able to do this as well. But also for scrubbing: BTRFS only 
check and repairs used blocks.


Another advantage in the future – not yet possible AFAIK:

- Different RAID levels on same filesystem yet different subvolumes, more 
flexibility as subvolumes are dynamically allocated, instead of statically 
sized

Ciao,
Martin

-- 
Martin Steigerwald
Consultant / Trainer

teamix GmbH
Südwestpark 43
90449 Nürnberg

fon:  +49 911 30999 55
fax:  +49 911 30999 99
mail: martin.steigerwald@teamix.de
web:  http://www.teamix.de
blog: http://blog.teamix.de

Amtsgericht Nürnberg, HRB 18320
Geschäftsführer: Oliver Kügow, Richard Müller

** JETZT ANMELDEN – teamix TechDemo - 23.07.2014 - http://www.teamix.de/techdemo **


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: btrfs RAID with enterprise SATA or SAS drives
  2014-07-09 14:48 ` Martin Steigerwald
@ 2014-07-10  2:10   ` Russell Coker
  2014-07-10  8:27     ` Martin Steigerwald
  2014-07-10 11:28     ` Austin S Hemmelgarn
  0 siblings, 2 replies; 10+ messages in thread
From: Russell Coker @ 2014-07-10  2:10 UTC (permalink / raw)
  To: Martin Steigerwald, linux-btrfs

On Wed, 9 Jul 2014 16:48:05 Martin Steigerwald wrote:
> > - for someone using SAS or enterprise SATA drives with Linux, I
> > understand btrfs gives the extra benefit of checksums, are there any
> > other specific benefits over using mdadm or dmraid?
> 
> I think I can answer this one.
> 
> Most important advantage I think is BTRFS is aware of which blocks of the
> RAID are in use and need to be synced:
> 
> - Instant initialization of RAID regardless of size (unless at some
> capacity mkfs.btrfs needs more time)

>From mdadm(8):

       --assume-clean
              Tell mdadm that the array pre-existed and is known to be  clean.
              It  can be useful when trying to recover from a major failure as
              you can be sure that no data will be affected unless  you  actu‐
              ally  write  to  the array.  It can also be used when creating a
              RAID1 or RAID10 if you want to avoid the initial resync, however
              this  practice  — while normally safe — is not recommended.  Use
              this only if you really know what you are doing.

              When the devices that will be part of a new  array  were  filled
              with zeros before creation the operator knows the array is actu‐
              ally clean. If that is the case,  such  as  after  running  bad‐
              blocks,  this  argument  can be used to tell mdadm the facts the
              operator knows.

While it might be regarded as a hack, it is possible to do a fairly instant 
initialisation of a Linux software RAID-1.

> - Rebuild after disk failure or disk replace will only copy *used* blocks

Have you done any benchmarks on this?  The down-side of copying used blocks is 
that you first need to discover which blocks are used.  Given that seek time is 
a major bottleneck at some portion of space used it will be faster to just 
copy the entire disk.

I haven't done any tests on BTRFS in this regard, but I've seen a disk 
replacement on ZFS run significantly slower than a dd of the block device 
would.

> Scrubbing can repair from good disk if RAID with redundancy, but SoftRAID
> should be able to do this as well. But also for scrubbing: BTRFS only
> check and repairs used blocks.

When you scrub Linux Software RAID (and in fact pretty much every RAID) it 
will only correct errors that the disks flag.  If a disk returns bad data and 
says that it's good then the RAID scrub will happily copy the bad data over 
the good data (for a RAID-1) or generate new valid parity blocks for bad data 
(for RAID-5/6).

http://research.cs.wisc.edu/adsl/Publications/corruption-fast08.html

Page 12 of the above document says that "nearline" disks (IE the ones people 
like me can afford for home use) have a 0.466% incidence of returning bad data 
and claiming it's good in a year.  Currently I run about 20 such disks in a 
variety of servers, workstations, and laptops.  Therefore the probability of 
having no such errors on all those disks would be .99534^20=.91081.  The 
probability of having no such errors over a period of 10 years would be 
(.99534^20)^10=.39290 which means that over 10 years I should expect to have 
such errors, which is why BTRFS RAID-1 and DUP metadata on single disks are 
necessary features.

-- 
My Main Blog         http://etbe.coker.com.au/
My Documents Blog    http://doc.coker.com.au/


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: btrfs RAID with enterprise SATA or SAS drives
  2014-07-10  2:10   ` Russell Coker
@ 2014-07-10  8:27     ` Martin Steigerwald
  2014-07-10 11:28     ` Austin S Hemmelgarn
  1 sibling, 0 replies; 10+ messages in thread
From: Martin Steigerwald @ 2014-07-10  8:27 UTC (permalink / raw)
  To: russell; +Cc: linux-btrfs

Am Donnerstag, 10. Juli 2014, 12:10:46 schrieb Russell Coker:
> On Wed, 9 Jul 2014 16:48:05 Martin Steigerwald wrote:
> > > - for someone using SAS or enterprise SATA drives with Linux, I
> > > understand btrfs gives the extra benefit of checksums, are there any
> > > other specific benefits over using mdadm or dmraid?
> > 
> > I think I can answer this one.
> > 
> > Most important advantage I think is BTRFS is aware of which blocks of
> > the RAID are in use and need to be synced:
> > 
> > - Instant initialization of RAID regardless of size (unless at some
> > capacity mkfs.btrfs needs more time)
> 
> From mdadm(8):
> 
>        --assume-clean
>               Tell mdadm that the array pre-existed and is known to be 
> clean. It  can be useful when trying to recover from a major failure as
> you can be sure that no data will be affected unless  you  actu‐ ally 
> write  to  the array.  It can also be used when creating a RAID1 or
> RAID10 if you want to avoid the initial resync, however this  practice 
> — while normally safe — is not recommended.  Use this only if you
> really know what you are doing.
> 
>               When the devices that will be part of a new  array  were 
> filled with zeros before creation the operator knows the array is actu‐
> ally clean. If that is the case,  such  as  after  running  bad‐
> blocks,  this  argument  can be used to tell mdadm the facts the
> operator knows.
> 
> While it might be regarded as a hack, it is possible to do a fairly
> instant initialisation of a Linux software RAID-1.

It is not the same.

BTRFS doesn´t care if the data of the unused blocks differ.

The RAID is on *filesystem* level, not on raw block level. The data on both 
disks don´t even have to be located in the exact same sectors.


> > - Rebuild after disk failure or disk replace will only copy *used*
> > blocks
> Have you done any benchmarks on this?  The down-side of copying used
> blocks is that you first need to discover which blocks are used.  Given
> that seek time is a major bottleneck at some portion of space used it
> will be faster to just copy the entire disk.

As BTRFS operates the RAID on the filesystem level it already knows which 
blocks are in use. I never had a disk replace or faulty disk yet in my two 
RAID-1 arrays, so I have no measurements. It may depend on free space 
fragementation.

> > Scrubbing can repair from good disk if RAID with redundancy, but
> > SoftRAID should be able to do this as well. But also for scrubbing:
> > BTRFS only check and repairs used blocks.
> 
> When you scrub Linux Software RAID (and in fact pretty much every RAID)
> it will only correct errors that the disks flag.  If a disk returns bad
> data and says that it's good then the RAID scrub will happily copy the
> bad data over the good data (for a RAID-1) or generate new valid parity
> blocks for bad data (for RAID-5/6).
> 
> http://research.cs.wisc.edu/adsl/Publications/corruption-fast08.html
> 
> Page 12 of the above document says that "nearline" disks (IE the ones
> people like me can afford for home use) have a 0.466% incidence of
> returning bad data and claiming it's good in a year.  Currently I run
> about 20 such disks in a variety of servers, workstations, and laptops.
>  Therefore the probability of having no such errors on all those disks
> would be .99534^20=.91081.  The probability of having no such errors
> over a period of 10 years would be (.99534^20)^10=.39290 which means
> that over 10 years I should expect to have such errors, which is why
> BTRFS RAID-1 and DUP metadata on single disks are necessary features.

Yeah, the checksums comes in handy here.

(excuse long signature, its added by server)

Ciao,

-- 
Martin Steigerwald
Consultant / Trainer

teamix GmbH
Südwestpark 43
90449 Nürnberg

fon:  +49 911 30999 55
fax:  +49 911 30999 99
mail: martin.steigerwald@teamix.de
web:  http://www.teamix.de
blog: http://blog.teamix.de

Amtsgericht Nürnberg, HRB 18320
Geschäftsführer: Oliver Kügow, Richard Müller

** JETZT ANMELDEN – teamix TechDemo - 23.07.2014 - http://www.teamix.de/techdemo **


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: btrfs RAID with enterprise SATA or SAS drives
  2014-07-10  2:10   ` Russell Coker
  2014-07-10  8:27     ` Martin Steigerwald
@ 2014-07-10 11:28     ` Austin S Hemmelgarn
  1 sibling, 0 replies; 10+ messages in thread
From: Austin S Hemmelgarn @ 2014-07-10 11:28 UTC (permalink / raw)
  To: russell, Martin Steigerwald, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 4676 bytes --]

On 2014-07-09 22:10, Russell Coker wrote:
> On Wed, 9 Jul 2014 16:48:05 Martin Steigerwald wrote:
>>> - for someone using SAS or enterprise SATA drives with Linux, I
>>> understand btrfs gives the extra benefit of checksums, are there any
>>> other specific benefits over using mdadm or dmraid?
>>
>> I think I can answer this one.
>>
>> Most important advantage I think is BTRFS is aware of which blocks of the
>> RAID are in use and need to be synced:
>>
>> - Instant initialization of RAID regardless of size (unless at some
>> capacity mkfs.btrfs needs more time)
> 
> From mdadm(8):
> 
>        --assume-clean
>               Tell mdadm that the array pre-existed and is known to be  clean.
>               It  can be useful when trying to recover from a major failure as
>               you can be sure that no data will be affected unless  you  actu‐
>               ally  write  to  the array.  It can also be used when creating a
>               RAID1 or RAID10 if you want to avoid the initial resync, however
>               this  practice  — while normally safe — is not recommended.  Use
>               this only if you really know what you are doing.
> 
>               When the devices that will be part of a new  array  were  filled
>               with zeros before creation the operator knows the array is actu‐
>               ally clean. If that is the case,  such  as  after  running  bad‐
>               blocks,  this  argument  can be used to tell mdadm the facts the
>               operator knows.
> 
> While it might be regarded as a hack, it is possible to do a fairly instant 
> initialisation of a Linux software RAID-1.
>
This has the notable disadvantage however that the first scrub you run
will essentially preform a full resync if you didn't make sure that the
disks had identical data to begin with.
>> - Rebuild after disk failure or disk replace will only copy *used* blocks
> 
> Have you done any benchmarks on this?  The down-side of copying used blocks is 
> that you first need to discover which blocks are used.  Given that seek time is 
> a major bottleneck at some portion of space used it will be faster to just 
> copy the entire disk.
> 
> I haven't done any tests on BTRFS in this regard, but I've seen a disk 
> replacement on ZFS run significantly slower than a dd of the block device 
> would.
> 
First of all, this isn't really a good comparison for two reasons:
1. EVERYTHING on ZFS (or any filesystem that tries to do that much work)
is slower than a dd of the raw block device.
2. Even if the throughput is lower, this is only really an issue if the
disk is more than half full, because you don't copy the unused blocks

Also, while it isn't really a recovery situation, I recently upgraded
from a 2 1TB disk BTRFS RAID1 setup to a 4 1TB disk BTRFS RAID10 setup,
and the performance of the re-balance really wasn't all that bad.  I
have maybe 100GB of actual data, so the array started out roughly 10%
full, and the re-balance only took about 2 minutes.  Of course, it
probably helps that I make a point to keep my filesystems de-fragmented,
scrub and balance regularly, and don't use a lot of sub-volumes or
snapshots, so the filesystem in question is not too different from what
it would have looked like if I had just wiped the FS and restored from a
backup.
>> Scrubbing can repair from good disk if RAID with redundancy, but SoftRAID
>> should be able to do this as well. But also for scrubbing: BTRFS only
>> check and repairs used blocks.
> 
> When you scrub Linux Software RAID (and in fact pretty much every RAID) it 
> will only correct errors that the disks flag.  If a disk returns bad data and 
> says that it's good then the RAID scrub will happily copy the bad data over 
> the good data (for a RAID-1) or generate new valid parity blocks for bad data 
> (for RAID-5/6).
> 
> http://research.cs.wisc.edu/adsl/Publications/corruption-fast08.html
> 
> Page 12 of the above document says that "nearline" disks (IE the ones people 
> like me can afford for home use) have a 0.466% incidence of returning bad data 
> and claiming it's good in a year.  Currently I run about 20 such disks in a 
> variety of servers, workstations, and laptops.  Therefore the probability of 
> having no such errors on all those disks would be .99534^20=.91081.  The 
> probability of having no such errors over a period of 10 years would be 
> (.99534^20)^10=.39290 which means that over 10 years I should expect to have 
> such errors, which is why BTRFS RAID-1 and DUP metadata on single disks are 
> necessary features.
> 



[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 2967 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2014-07-10 11:28 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-05-09 22:01 btrfs RAID with enterprise SATA or SAS drives Daniel Pocock
2012-05-10 19:58 ` Hubert Kario
2012-05-18 16:19   ` btrfs RAID with RAID cards (thread renamed) Daniel Pocock
2012-05-11  2:18 ` btrfs RAID with enterprise SATA or SAS drives Duncan
2012-05-11 16:58   ` Martin Steigerwald
2012-05-14  8:38     ` Duncan
2014-07-09 14:48 ` Martin Steigerwald
2014-07-10  2:10   ` Russell Coker
2014-07-10  8:27     ` Martin Steigerwald
2014-07-10 11:28     ` Austin S Hemmelgarn

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.