All of lore.kernel.org
 help / color / mirror / Atom feed
* RE: Scrubbing with BTRFS Raid 5
@ 2014-01-21 18:03 Graham Fleming
  2014-01-22 15:39 ` Duncan
  0 siblings, 1 reply; 14+ messages in thread
From: Graham Fleming @ 2014-01-21 18:03 UTC (permalink / raw)
  To: linux-btrfs

Thanks again for the added info; very helpful.

I want to keep playing around with BTRFSS RAID 5 and testing with it... assuming I have a drive with bad blocks, or let's say some inconsistent parity am I right in assuming that a) a btrfs scrub operation will not fix the stripes with bad parity and b) a balance operation will not be successful? Or would a balance operation work to re-write parity?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Scrubbing with BTRFS Raid 5
  2014-01-21 18:03 Scrubbing with BTRFS Raid 5 Graham Fleming
@ 2014-01-22 15:39 ` Duncan
  0 siblings, 0 replies; 14+ messages in thread
From: Duncan @ 2014-01-22 15:39 UTC (permalink / raw)
  To: linux-btrfs

Graham Fleming posted on Tue, 21 Jan 2014 10:03:26 -0800 as excerpted:

> I want to keep playing around with BTRFSS RAID 5 and testing with it...
> assuming I have a drive with bad blocks, or let's say some inconsistent
> parity am I right in assuming that a) a btrfs scrub operation will not
> fix the stripes with bad parity

What I know is that it is said btrfs scrub doesn't work with btrfs raid5/6 
yet.  I don't know how it actually fails (tho I'd hope it simply returns 
an error to the effect that it doesn't work with raid5/6 yet) as I've not 
actually tried that mode, here.

> and b) a balance operation will not be
> successful? Or would a balance operation work to re-write parity?

Balance actually rewrites everything (well, everything matching its 
filters if a filtered balance is used, everything, if not), so it should 
rewrite parity correctly.

AFAIK, all the writing works and routine read works.  It's the error 
recovery that's still only partially implemented.  Since reading just 
reads data, not parity unless there's a dropped device or the like to 
recover from, as long as all devices are active and there's a good copy 
of the data (based on btrfs checksumming) to read, the rebalance should 
just use and rewrite that, ignoring the bad parity.



-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Scrubbing with BTRFS Raid 5
  2014-01-22 21:16       ` Chris Mason
@ 2014-01-22 22:36         ` ronnie sahlberg
  0 siblings, 0 replies; 14+ messages in thread
From: ronnie sahlberg @ 2014-01-22 22:36 UTC (permalink / raw)
  To: Chris Mason; +Cc: linux-btrfs, 1i5t5.duncan

On Wed, Jan 22, 2014 at 1:16 PM, Chris Mason <clm@fb.com> wrote:
> On Wed, 2014-01-22 at 13:06 -0800, ronnie sahlberg wrote:
>> On Wed, Jan 22, 2014 at 12:45 PM, Chris Mason <clm@fb.com> wrote:
>> > On Tue, 2014-01-21 at 17:08 +0000, Duncan wrote:
>> >> Graham Fleming posted on Tue, 21 Jan 2014 01:06:37 -0800 as excerpted:
>> >>
>> >> > Thanks for all the info guys.
>> >> >
>> >> > I ran some tests on the latest 3.12.8 kernel. I set up 3 1GB files and
>> >> > attached them to /dev/loop{1..3} and created a BTRFS RAID 5 volume with
>> >> > them.
>> >> >
>> >> > I copied some data (from dev/urandom) into two test files and got their
>> >> > MD5 sums and saved them to a text file.
>> >> >
>> >> > I then unmounted the volume, trashed Disk3 and created a new Disk4 file,
>> >> > attached to /dev/loop4.
>> >> >
>> >> > I mounted the BTRFS RAID 5 volume degraded and the md5 sums were fine. I
>> >> > added /dev/loop4 to the volume and then deleted the missing device and
>> >> > it rebalanced. I had data spread out on all three devices now. MD5 sums
>> >> > unchanged on test files.
>> >> >
>> >> > This, to me, implies BTRFS RAID 5 is working quite well and I can in
>> >> > fact,
>> >> > replace a dead drive.
>> >> >
>> >> > Am I missing something?
>> >>
>> >> What you're missing is that device death and replacement rarely happens
>> >> as neatly as your test (clean unmounts and all, no middle-of-process
>> >> power-loss, etc).  You tested best-case, not real-life or worst-case.
>> >>
>> >> Try that again, setting up the raid5, setting up a big write to it,
>> >> disconnect one device in the middle of that write (I'm not sure if just
>> >> dropping the loop works or if the kernel gracefully shuts down the loop
>> >> device), then unplugging the system without unmounting... and /then/ see
>> >> what sense btrfs can make of the resulting mess.  In theory, with an
>> >> atomic write btree filesystem such as btrfs, even that should work fine,
>> >> minus perhaps the last few seconds of file-write activity, but the
>> >> filesystem should remain consistent on degraded remount and device add,
>> >> device remove, and rebalance, even if another power-pull happens in the
>> >> middle of /that/.
>> >>
>> >> But given btrfs' raid5 incompleteness, I don't expect that will work.
>> >>
>> >
>> > raid5/6 deals with IO errors from one or two drives, and it is able to
>> > reconstruct the parity from the remaining drives and give you good data.
>> >
>> > If we hit a crc error, the raid5/6 code will try a parity reconstruction
>> > to make good data, and if we find good data from the other copy, it'll
>> > return that up to userland.
>> >
>> > In other words, for those cases it works just like raid1/10.  What it
>> > won't do (yet) is write that good data back to the storage.  It'll stay
>> > bad until you remove the device or run balance to rewrite everything.
>> >
>> > Balance will reconstruct parity to get good data as it balances.  This
>> > isn't as useful as scrub, but that work is coming.
>> >
>>
>> That is awesome!
>>
>> What about online conversion from not-raid5/6 to raid5/6  what is the
>> status for that code, for example
>> what happens if there is a failure during the conversion or a reboot ?
>
> The conversion code uses balance, so that works normally.  If there is a
> failure during the conversion you'll end up with some things raid5/6 and
> somethings at whatever other level you used.
>
> The data will still be there, but you are more prone to enospc
> problems ;)
>

Ok, but if there is enough space,  you could just restart the balance
and it will eventually finish and all should, with some luck, be ok?

Awesome. This sounds like things are a lot closer to raid5/6 being
fully operational than I realized.


> -chris
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Scrubbing with BTRFS Raid 5
  2014-01-22 21:06     ` ronnie sahlberg
@ 2014-01-22 21:16       ` Chris Mason
  2014-01-22 22:36         ` ronnie sahlberg
  0 siblings, 1 reply; 14+ messages in thread
From: Chris Mason @ 2014-01-22 21:16 UTC (permalink / raw)
  To: ronniesahlberg; +Cc: linux-btrfs, 1i5t5.duncan

On Wed, 2014-01-22 at 13:06 -0800, ronnie sahlberg wrote:
> On Wed, Jan 22, 2014 at 12:45 PM, Chris Mason <clm@fb.com> wrote:
> > On Tue, 2014-01-21 at 17:08 +0000, Duncan wrote:
> >> Graham Fleming posted on Tue, 21 Jan 2014 01:06:37 -0800 as excerpted:
> >>
> >> > Thanks for all the info guys.
> >> >
> >> > I ran some tests on the latest 3.12.8 kernel. I set up 3 1GB files and
> >> > attached them to /dev/loop{1..3} and created a BTRFS RAID 5 volume with
> >> > them.
> >> >
> >> > I copied some data (from dev/urandom) into two test files and got their
> >> > MD5 sums and saved them to a text file.
> >> >
> >> > I then unmounted the volume, trashed Disk3 and created a new Disk4 file,
> >> > attached to /dev/loop4.
> >> >
> >> > I mounted the BTRFS RAID 5 volume degraded and the md5 sums were fine. I
> >> > added /dev/loop4 to the volume and then deleted the missing device and
> >> > it rebalanced. I had data spread out on all three devices now. MD5 sums
> >> > unchanged on test files.
> >> >
> >> > This, to me, implies BTRFS RAID 5 is working quite well and I can in
> >> > fact,
> >> > replace a dead drive.
> >> >
> >> > Am I missing something?
> >>
> >> What you're missing is that device death and replacement rarely happens
> >> as neatly as your test (clean unmounts and all, no middle-of-process
> >> power-loss, etc).  You tested best-case, not real-life or worst-case.
> >>
> >> Try that again, setting up the raid5, setting up a big write to it,
> >> disconnect one device in the middle of that write (I'm not sure if just
> >> dropping the loop works or if the kernel gracefully shuts down the loop
> >> device), then unplugging the system without unmounting... and /then/ see
> >> what sense btrfs can make of the resulting mess.  In theory, with an
> >> atomic write btree filesystem such as btrfs, even that should work fine,
> >> minus perhaps the last few seconds of file-write activity, but the
> >> filesystem should remain consistent on degraded remount and device add,
> >> device remove, and rebalance, even if another power-pull happens in the
> >> middle of /that/.
> >>
> >> But given btrfs' raid5 incompleteness, I don't expect that will work.
> >>
> >
> > raid5/6 deals with IO errors from one or two drives, and it is able to
> > reconstruct the parity from the remaining drives and give you good data.
> >
> > If we hit a crc error, the raid5/6 code will try a parity reconstruction
> > to make good data, and if we find good data from the other copy, it'll
> > return that up to userland.
> >
> > In other words, for those cases it works just like raid1/10.  What it
> > won't do (yet) is write that good data back to the storage.  It'll stay
> > bad until you remove the device or run balance to rewrite everything.
> >
> > Balance will reconstruct parity to get good data as it balances.  This
> > isn't as useful as scrub, but that work is coming.
> >
> 
> That is awesome!
> 
> What about online conversion from not-raid5/6 to raid5/6  what is the
> status for that code, for example
> what happens if there is a failure during the conversion or a reboot ?

The conversion code uses balance, so that works normally.  If there is a
failure during the conversion you'll end up with some things raid5/6 and
somethings at whatever other level you used.

The data will still be there, but you are more prone to enospc
problems ;)

-chris


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Scrubbing with BTRFS Raid 5
  2014-01-22 20:45   ` Chris Mason
@ 2014-01-22 21:06     ` ronnie sahlberg
  2014-01-22 21:16       ` Chris Mason
  0 siblings, 1 reply; 14+ messages in thread
From: ronnie sahlberg @ 2014-01-22 21:06 UTC (permalink / raw)
  To: Chris Mason; +Cc: 1i5t5.duncan, linux-btrfs

On Wed, Jan 22, 2014 at 12:45 PM, Chris Mason <clm@fb.com> wrote:
> On Tue, 2014-01-21 at 17:08 +0000, Duncan wrote:
>> Graham Fleming posted on Tue, 21 Jan 2014 01:06:37 -0800 as excerpted:
>>
>> > Thanks for all the info guys.
>> >
>> > I ran some tests on the latest 3.12.8 kernel. I set up 3 1GB files and
>> > attached them to /dev/loop{1..3} and created a BTRFS RAID 5 volume with
>> > them.
>> >
>> > I copied some data (from dev/urandom) into two test files and got their
>> > MD5 sums and saved them to a text file.
>> >
>> > I then unmounted the volume, trashed Disk3 and created a new Disk4 file,
>> > attached to /dev/loop4.
>> >
>> > I mounted the BTRFS RAID 5 volume degraded and the md5 sums were fine. I
>> > added /dev/loop4 to the volume and then deleted the missing device and
>> > it rebalanced. I had data spread out on all three devices now. MD5 sums
>> > unchanged on test files.
>> >
>> > This, to me, implies BTRFS RAID 5 is working quite well and I can in
>> > fact,
>> > replace a dead drive.
>> >
>> > Am I missing something?
>>
>> What you're missing is that device death and replacement rarely happens
>> as neatly as your test (clean unmounts and all, no middle-of-process
>> power-loss, etc).  You tested best-case, not real-life or worst-case.
>>
>> Try that again, setting up the raid5, setting up a big write to it,
>> disconnect one device in the middle of that write (I'm not sure if just
>> dropping the loop works or if the kernel gracefully shuts down the loop
>> device), then unplugging the system without unmounting... and /then/ see
>> what sense btrfs can make of the resulting mess.  In theory, with an
>> atomic write btree filesystem such as btrfs, even that should work fine,
>> minus perhaps the last few seconds of file-write activity, but the
>> filesystem should remain consistent on degraded remount and device add,
>> device remove, and rebalance, even if another power-pull happens in the
>> middle of /that/.
>>
>> But given btrfs' raid5 incompleteness, I don't expect that will work.
>>
>
> raid5/6 deals with IO errors from one or two drives, and it is able to
> reconstruct the parity from the remaining drives and give you good data.
>
> If we hit a crc error, the raid5/6 code will try a parity reconstruction
> to make good data, and if we find good data from the other copy, it'll
> return that up to userland.
>
> In other words, for those cases it works just like raid1/10.  What it
> won't do (yet) is write that good data back to the storage.  It'll stay
> bad until you remove the device or run balance to rewrite everything.
>
> Balance will reconstruct parity to get good data as it balances.  This
> isn't as useful as scrub, but that work is coming.
>

That is awesome!

What about online conversion from not-raid5/6 to raid5/6  what is the
status for that code, for example
what happens if there is a failure during the conversion or a reboot ?



> -chris
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Scrubbing with BTRFS Raid 5
  2014-01-21 17:08 ` Duncan
  2014-01-21 17:18   ` Jim Salter
@ 2014-01-22 20:45   ` Chris Mason
  2014-01-22 21:06     ` ronnie sahlberg
  1 sibling, 1 reply; 14+ messages in thread
From: Chris Mason @ 2014-01-22 20:45 UTC (permalink / raw)
  To: 1i5t5.duncan; +Cc: linux-btrfs

On Tue, 2014-01-21 at 17:08 +0000, Duncan wrote:
> Graham Fleming posted on Tue, 21 Jan 2014 01:06:37 -0800 as excerpted:
> 
> > Thanks for all the info guys.
> > 
> > I ran some tests on the latest 3.12.8 kernel. I set up 3 1GB files and
> > attached them to /dev/loop{1..3} and created a BTRFS RAID 5 volume with
> > them.
> > 
> > I copied some data (from dev/urandom) into two test files and got their
> > MD5 sums and saved them to a text file.
> > 
> > I then unmounted the volume, trashed Disk3 and created a new Disk4 file,
> > attached to /dev/loop4.
> > 
> > I mounted the BTRFS RAID 5 volume degraded and the md5 sums were fine. I
> > added /dev/loop4 to the volume and then deleted the missing device and
> > it rebalanced. I had data spread out on all three devices now. MD5 sums
> > unchanged on test files.
> > 
> > This, to me, implies BTRFS RAID 5 is working quite well and I can in
> > fact,
> > replace a dead drive.
> > 
> > Am I missing something?
> 
> What you're missing is that device death and replacement rarely happens 
> as neatly as your test (clean unmounts and all, no middle-of-process 
> power-loss, etc).  You tested best-case, not real-life or worst-case.
> 
> Try that again, setting up the raid5, setting up a big write to it, 
> disconnect one device in the middle of that write (I'm not sure if just 
> dropping the loop works or if the kernel gracefully shuts down the loop 
> device), then unplugging the system without unmounting... and /then/ see 
> what sense btrfs can make of the resulting mess.  In theory, with an 
> atomic write btree filesystem such as btrfs, even that should work fine, 
> minus perhaps the last few seconds of file-write activity, but the 
> filesystem should remain consistent on degraded remount and device add, 
> device remove, and rebalance, even if another power-pull happens in the 
> middle of /that/.
> 
> But given btrfs' raid5 incompleteness, I don't expect that will work.
> 

raid5/6 deals with IO errors from one or two drives, and it is able to
reconstruct the parity from the remaining drives and give you good data.

If we hit a crc error, the raid5/6 code will try a parity reconstruction
to make good data, and if we find good data from the other copy, it'll
return that up to userland.

In other words, for those cases it works just like raid1/10.  What it
won't do (yet) is write that good data back to the storage.  It'll stay
bad until you remove the device or run balance to rewrite everything.

Balance will reconstruct parity to get good data as it balances.  This
isn't as useful as scrub, but that work is coming.

-chris




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Scrubbing with BTRFS Raid 5
  2014-01-21 17:18   ` Jim Salter
  2014-01-21 17:38     ` Chris Murphy
@ 2014-01-22 16:02     ` Duncan
  1 sibling, 0 replies; 14+ messages in thread
From: Duncan @ 2014-01-22 16:02 UTC (permalink / raw)
  To: linux-btrfs

Jim Salter posted on Tue, 21 Jan 2014 12:18:01 -0500 as excerpted:

> Would it be reasonably accurate to say "btrfs' RAID5 implementation is
> likely working well enough and safe enough if you are backing up
> regularly and are willing and able to restore from backup if necessary
> if a device failure goes horribly wrong", then?

I'd say (and IIRC I did say somewhere, but don't remember if it was this 
thread) that in reliability terms btrfs raid5 should be treated like 
btrfs raid0 at this point.  Raid0 is well known to have absolutely no 
failover -- if a device fails, the raid is toast.  It's possible so-
called "extreme measures" may recover data from the surviving bits (think 
the $expen$ive$ $ervice$ of data recovery firms), but the idea is that 
either no data that's not easily replaced is stored on a raid0 in the 
first place, or if it is, there's (tested recoverable) backup to the 
level that you're fully comfortable with losing EVERYTHING not backed up.

Examples of good data for raid0 are the kernel sources (as a user, not a 
dev, so you're not hacking on them), your distro's local package cache, 
browser cache, etc.  This because by definition all those examples have 
the net as their backup, so loss of a local copy means a bit more to 
download, at worst.

That's what btrfs raid5/6 are at the moment, effectively raid0 from a 
recovery perspective.

Now the parity /is/ being written; it simply can't be treated as 
available for recovery.  So supposing you do /not/ lose a device (or 
suffer a bad checksum) on the raid5 until after the recovery code is 
complete and available, you've effectively "free" upgraded from raid0 
reliability to raid5 reliability as soon as recovery is possible, which 
will be nice, and meanwhile you can test the operational functionality, 
so there /are/ reasons you might want to run the btrfs raid5 mode now.  
As long as you remember it's currently effectively raid0 should something 
go wrong, and you either don't use it for valuable data in the first 
place, or you're willing to do without any updates to that data since the 
last tested backup, should it come to that.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Scrubbing with BTRFS Raid 5
  2014-01-21 17:38     ` Chris Murphy
@ 2014-01-21 18:25       ` Jim Salter
  0 siblings, 0 replies; 14+ messages in thread
From: Jim Salter @ 2014-01-21 18:25 UTC (permalink / raw)
  To: Chris Murphy, Btrfs BTRFS

There are different values of "testing" and of "production" - in my 
world, at least, they're not atomically defined categories. =)

On 01/21/2014 12:38 PM, Chris Murphy wrote:
> It's for testing purposes. If you really want to commit a production 
> machine for testing a file system, and you're prepared to lose 100% of 
> changes since the last backup, OK do that.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Scrubbing with BTRFS Raid 5
  2014-01-21 17:18   ` Jim Salter
@ 2014-01-21 17:38     ` Chris Murphy
  2014-01-21 18:25       ` Jim Salter
  2014-01-22 16:02     ` Duncan
  1 sibling, 1 reply; 14+ messages in thread
From: Chris Murphy @ 2014-01-21 17:38 UTC (permalink / raw)
  To: Btrfs BTRFS


On Jan 21, 2014, at 10:18 AM, Jim Salter <jim@jrs-s.net> wrote:

> Would it be reasonably accurate to say "btrfs' RAID5 implementation is likely working well enough and safe enough if you are backing up regularly and are willing and able to restore from backup if necessary if a device failure goes horribly wrong", then?

It's for testing purposes. If you really want to commit a production machine for testing a file system, and you're prepared to lose 100% of changes since the last backup, OK do that.

> If the worst thing wrong with RAID5/6 in current btrfs is "might not deal as well as you'd like with a really nasty example of single-drive failure", that would likely be livable for me.

It was just one hypothetical scenario, it's not the only one. If it's really truly seriously being tested, eventually you'll break it.

Chris Murphy

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Scrubbing with BTRFS Raid 5
  2014-01-21 17:08 ` Duncan
@ 2014-01-21 17:18   ` Jim Salter
  2014-01-21 17:38     ` Chris Murphy
  2014-01-22 16:02     ` Duncan
  2014-01-22 20:45   ` Chris Mason
  1 sibling, 2 replies; 14+ messages in thread
From: Jim Salter @ 2014-01-21 17:18 UTC (permalink / raw)
  To: Duncan, linux-btrfs

Would it be reasonably accurate to say "btrfs' RAID5 implementation is 
likely working well enough and safe enough if you are backing up 
regularly and are willing and able to restore from backup if necessary 
if a device failure goes horribly wrong", then?

This is a reasonably serious question. My typical scenario runs along 
the lines of two identical machines with regular filesystem replication 
between them; in the event of something going horribly horribly wrong 
with the production machine, I just spin up services on the replicated 
machine - making it "production" - and then deal with the broken one at 
relative leisure.

If the worst thing wrong with RAID5/6 in current btrfs is "might not 
deal as well as you'd like with a really nasty example of single-drive 
failure", that would likely be livable for me.

On 01/21/2014 12:08 PM, Duncan wrote:
 > What you're missing is that device death and replacement rarely happens
 > as neatly as your test (clean unmounts and all, no middle-of-process
 > power-loss, etc).  You tested best-case, not real-life or worst-case.
 >
 > Try that again, setting up the raid5, setting up a big write to it,
 > disconnect one device in the middle of that write (I'm not sure if just
 > dropping the loop works or if the kernel gracefully shuts down the loop
 > device), then unplugging the system without unmounting... and /then/ see
 > what sense btrfs can make of the resulting mess.  In theory, with an
 > atomic write btree filesystem such as btrfs, even that should work fine,
 > minus perhaps the last few seconds of file-write activity, but the
 > filesystem should remain consistent on degraded remount and device add,
 > device remove, and rebalance, even if another power-pull happens in the
 > middle of /that/.
 >
 > But given btrfs' raid5 incompleteness, I don't expect that will work.
 >


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Scrubbing with BTRFS Raid 5
  2014-01-21  9:06 Graham Fleming
@ 2014-01-21 17:08 ` Duncan
  2014-01-21 17:18   ` Jim Salter
  2014-01-22 20:45   ` Chris Mason
  0 siblings, 2 replies; 14+ messages in thread
From: Duncan @ 2014-01-21 17:08 UTC (permalink / raw)
  To: linux-btrfs

Graham Fleming posted on Tue, 21 Jan 2014 01:06:37 -0800 as excerpted:

> Thanks for all the info guys.
> 
> I ran some tests on the latest 3.12.8 kernel. I set up 3 1GB files and
> attached them to /dev/loop{1..3} and created a BTRFS RAID 5 volume with
> them.
> 
> I copied some data (from dev/urandom) into two test files and got their
> MD5 sums and saved them to a text file.
> 
> I then unmounted the volume, trashed Disk3 and created a new Disk4 file,
> attached to /dev/loop4.
> 
> I mounted the BTRFS RAID 5 volume degraded and the md5 sums were fine. I
> added /dev/loop4 to the volume and then deleted the missing device and
> it rebalanced. I had data spread out on all three devices now. MD5 sums
> unchanged on test files.
> 
> This, to me, implies BTRFS RAID 5 is working quite well and I can in
> fact,
> replace a dead drive.
> 
> Am I missing something?

What you're missing is that device death and replacement rarely happens 
as neatly as your test (clean unmounts and all, no middle-of-process 
power-loss, etc).  You tested best-case, not real-life or worst-case.

Try that again, setting up the raid5, setting up a big write to it, 
disconnect one device in the middle of that write (I'm not sure if just 
dropping the loop works or if the kernel gracefully shuts down the loop 
device), then unplugging the system without unmounting... and /then/ see 
what sense btrfs can make of the resulting mess.  In theory, with an 
atomic write btree filesystem such as btrfs, even that should work fine, 
minus perhaps the last few seconds of file-write activity, but the 
filesystem should remain consistent on degraded remount and device add, 
device remove, and rebalance, even if another power-pull happens in the 
middle of /that/.

But given btrfs' raid5 incompleteness, I don't expect that will work.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: Scrubbing with BTRFS Raid 5
@ 2014-01-21  9:06 Graham Fleming
  2014-01-21 17:08 ` Duncan
  0 siblings, 1 reply; 14+ messages in thread
From: Graham Fleming @ 2014-01-21  9:06 UTC (permalink / raw)
  To: linux-btrfs

Thanks for all the info guys.

I ran some tests on the latest 3.12.8 kernel. I set up 3 1GB files and attached them to /dev/loop{1..3} and created a BTRFS RAID 5 volume with them.

I copied some data (from dev/urandom) into two test files and got their MD5 sums and saved them to a text file.

I then unmounted the volume, trashed Disk3 and created a new Disk4 file, attached to /dev/loop4.

I mounted the BTRFS RAID 5 volume degraded and the md5 sums were fine. I added /dev/loop4 to the volume and then deleted the missing device and it rebalanced. I had data spread out on all three devices now. MD5 sums unchanged on test files.

This, to me, implies BTRFS RAID 5 is working quite well and I can in fact, replace a dead drive.

Am I missing something?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Scrubbing with BTRFS Raid 5
  2014-01-20  0:53 Graham Fleming
@ 2014-01-20 13:21 ` Duncan
  0 siblings, 0 replies; 14+ messages in thread
From: Duncan @ 2014-01-20 13:21 UTC (permalink / raw)
  To: linux-btrfs

Graham Fleming posted on Sun, 19 Jan 2014 16:53:13 -0800 as excerpted:

> From the wiki, I see that scrubbing is not supported on a RAID 5 volume.
> 
> Can I still run the scrub routing (maybe read-only?) to check for any
> issues. I understand at this point running 3.12 kernel there are no
> routines to fix parity issues with RAID 5 while scrubbing but just want
> to know if I'm either a) not causing any harm by running the scrub on a
> RAID 5 volume and b) it's actually goin to provide me with useful
> feedback (ie file X is damaged).

This isn't a direct answer to your question, but answers a somewhat more 
basic question...

Btrfs raid5/6 isn't ready for use in a live-environment yet, period, only 
for testing where the reliability of the data beyond the test doesn't 
matter.  It works as long as everything works normally, writing out the 
parity blocks as well as the data, but besides scrub not yet being 
implemented, neither is recovery from loss of device, or from out-of-sync-
state power-off.

Since the whole /point/ of raid5/6 is recovery from device-loss, without 
that it's simply a less efficient raid0, which accepts the risk of fully 
data loss if a device is lost in ordered to gain the higher thruput of N-
way data striping.  So in practice at this point, if you're willing to 
accept loss of all data and want the higher thruput, you'd use raid0 or 
perhaps single mode instead, or if not, you'd use raid1 or raid10 mode.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Scrubbing with BTRFS Raid 5
@ 2014-01-20  0:53 Graham Fleming
  2014-01-20 13:21 ` Duncan
  0 siblings, 1 reply; 14+ messages in thread
From: Graham Fleming @ 2014-01-20  0:53 UTC (permalink / raw)
  To: linux-btrfs

>From the wiki, I see that scrubbing is not supported on a RAID 5 volume.

Can I still run the scrub routing (maybe read-only?) to check for any issues. I understand at this point running 3.12 kernel there are no routines to fix parity issues with RAID 5 while scrubbing but just want to know if I'm either a) not causing any harm by running the scrub on a RAID 5 volume and b) it's actually goin to provide me with useful feedback (ie file X is damaged).

Thanks

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2014-01-22 22:36 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-01-21 18:03 Scrubbing with BTRFS Raid 5 Graham Fleming
2014-01-22 15:39 ` Duncan
  -- strict thread matches above, loose matches on Subject: below --
2014-01-21  9:06 Graham Fleming
2014-01-21 17:08 ` Duncan
2014-01-21 17:18   ` Jim Salter
2014-01-21 17:38     ` Chris Murphy
2014-01-21 18:25       ` Jim Salter
2014-01-22 16:02     ` Duncan
2014-01-22 20:45   ` Chris Mason
2014-01-22 21:06     ` ronnie sahlberg
2014-01-22 21:16       ` Chris Mason
2014-01-22 22:36         ` ronnie sahlberg
2014-01-20  0:53 Graham Fleming
2014-01-20 13:21 ` Duncan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.