Is metadata redundant over more than one drive with raid0 too?

All of lore.kernel.org
 help / color / mirror / Atom feed

* Is metadata redundant over more than one drive with raid0 too?
@ 2014-05-03 23:27 Marc MERLIN
  2014-05-04  6:57 ` Brendan Hide
  2014-05-04 21:49 ` Is metadata redundant over more than one drive with raid0 too? Duncan
  0 siblings, 2 replies; 15+ messages in thread
From: Marc MERLIN @ 2014-05-03 23:27 UTC (permalink / raw)
  To: linux-btrfs

So, I was thinking. In the past, I've done this:
mkfs.btrfs -d raid0 -m raid1 -L btrfs_raid0 /dev/mapper/raid0d*

My rationale at the time was that if I lose a drive, I'll still have
full metadata for the entire filesystem and only missing files.
If I have raid1 with 2 drives, I should end up with 4 copies of each
file's metadata, right?

But now I have 2 questions
1) btrfs has two copies of all metadata on even a single drive, correct?
If so, and I have a -d raid0 -m raid0 filesystem, are both copies of the
metadata on the same drive or is btrfs smart enough to spread out
metadata copies so that they're not on the same drive?

2) does btrfs lay out files on raid0 so that files aren't striped across
more than one drive, so that if I lose a drive, I only lose whole files,
but not little chunks of all my files, making my entire FS toast?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Is metadata redundant over more than one drive with raid0 too?
  2014-05-03 23:27 Is metadata redundant over more than one drive with raid0 too? Marc MERLIN
@ 2014-05-04  6:57 ` Brendan Hide
  2014-05-04  7:24   ` Marc MERLIN
  2014-05-04 21:49 ` Is metadata redundant over more than one drive with raid0 too? Duncan
  1 sibling, 1 reply; 15+ messages in thread
From: Brendan Hide @ 2014-05-04  6:57 UTC (permalink / raw)
  To: Marc MERLIN, linux-btrfs

Hi, Marc

Raid0 is not redundant in any way. See inline below.

On 2014/05/04 01:27 AM, Marc MERLIN wrote:
> So, I was thinking. In the past, I've done this:
> mkfs.btrfs -d raid0 -m raid1 -L btrfs_raid0 /dev/mapper/raid0d*
>
> My rationale at the time was that if I lose a drive, I'll still have
> full metadata for the entire filesystem and only missing files.
> If I have raid1 with 2 drives, I should end up with 4 copies of each
> file's metadata, right?
>
> But now I have 2 questions
> 1) btrfs has two copies of all metadata on even a single drive, correct?

Only when *specifically* using -m dup (which is the default on a single 
non-SSD device), will there be two copies of the metadata stored on a 
single device. This is not recommended when using multiple devices as it 
means one device failure will likely cause critical loss of metadata. 
When using -m raid1 (as is the case in your first example above and as 
is the default with multiple devices), two copies of the metadata are 
distributed across two devices (each of those devices with a copy has 
only a single copy).
> If so, and I have a -d raid0 -m raid0 filesystem, are both copies of the
> metadata on the same drive or is btrfs smart enough to spread out
> metadata copies so that they're not on the same drive?

This will mean there is only a single copy, albeit striped across the 
drives.
>
> 2) does btrfs lay out files on raid0 so that files aren't striped across
> more than one drive, so that if I lose a drive, I only lose whole files,
> but not little chunks of all my files, making my entire FS toast?

"raid0" currently allocates a single chunk on each device and then makes 
use of "RAID0-like" stripes across these chunks until a new chunk needs 
to be allocated. This is good for performance but not good for 
redundancy. A total failure of a single device will mean any large files 
will be lost and only files smaller than the default per-disk stripe 
width (I believe this used to be 4K and is now 16K - I could be wrong) 
stored only on the remaining disk will be available.

The scenario you mentioned at the beginning, "if I lose a drive, I'll 
still have full metadata for the entire filesystem and only missing 
files" is more applicable to using "-m raid1 -d single". Single is not 
geared towards performance and, though it doesn't guarantee a file is 
only on a single disk, the allocation does mean that the majority of all 
files smaller than a chunk will be stored on only one disk or the other 
- not both.
>
> Thanks,
> Marc

I hope the above is helpful.

-- 
__________
Brendan Hide
http://swiftspirit.co.za/
http://www.webafrica.co.za/?AFF1E97

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Is metadata redundant over more than one drive with raid0 too?
  2014-05-04  6:57 ` Brendan Hide
@ 2014-05-04  7:24   ` Marc MERLIN
  2014-05-04  7:44     ` Brendan Hide
  2014-05-05  0:46     ` Daniel Lee
  0 siblings, 2 replies; 15+ messages in thread
From: Marc MERLIN @ 2014-05-04  7:24 UTC (permalink / raw)
  To: Brendan Hide; +Cc: linux-btrfs

On Sun, May 04, 2014 at 08:57:19AM +0200, Brendan Hide wrote:
> Hi, Marc
> 
> Raid0 is not redundant in any way. See inline below.
 
Thanks for clearing things up.

> >But now I have 2 questions
> >1) btrfs has two copies of all metadata on even a single drive, correct?
> 
> Only when *specifically* using -m dup (which is the default on a
> single non-SSD device), will there be two copies of the metadata
> stored on a single device. This is not recommended when using

Ah, so -m dup is default like I thought, but not on SSD?
Ooops, that means that my laptop does not have redundant metadata on its
SSD like I thought. Thanks for the heads up.
Ah, I see the man page now "This is because SSDs can remap blocks
internally so duplicate blocks could end up in the same erase block
which negates the benefits of doing metadata duplication."

> multiple devices as it means one device failure will likely cause
> critical loss of metadata. 

That's the part where I'm not clear:

What's the difference between -m dup and -m raid1
Don't they both say 2 copies of the metadata?
Is -m dup only valid for a single drive, while -m raid1 for 2+ drives?

> >If so, and I have a -d raid0 -m raid0 filesystem, are both copies of the
> >metadata on the same drive or is btrfs smart enough to spread out
> >metadata copies so that they're not on the same drive?
> 
> This will mean there is only a single copy, albeit striped across
> the drives.

Ok, so -m raid0 only means a single copy of metadata, thanks for
explaining.

> good for redundancy. A total failure of a single device will mean
> any large files will be lost and only files smaller than the default
> per-disk stripe width (I believe this used to be 4K and is now 16K -
> I could be wrong) stored only on the remaining disk will be
> available.
 
Gotcha, thanks for confirming, so -m raid1 -d raid0 really only protects
against metadata corruption or a single block loss, but otherwise if you
lost a drive in a 2 drive raid0, you'll have lost more than just half
your files.

> The scenario you mentioned at the beginning, "if I lose a drive,
> I'll still have full metadata for the entire filesystem and only
> missing files" is more applicable to using "-m raid1 -d single".
> Single is not geared towards performance and, though it doesn't
> guarantee a file is only on a single disk, the allocation does mean
> that the majority of all files smaller than a chunk will be stored
> on only one disk or the other - not both.

Ok, so in other words:
-d raid0: if you one 1 drive out of 2, you may end up with small files
and the rest will be lost

-d single: you're more likely to have files be on one drive or the
other, although there is no guarantee there either.

Correct?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Is metadata redundant over more than one drive with raid0 too?
  2014-05-04  7:24   ` Marc MERLIN
@ 2014-05-04  7:44     ` Brendan Hide
  2014-05-05  1:27       ` Marc MERLIN
  2014-05-05  0:46     ` Daniel Lee
  1 sibling, 1 reply; 15+ messages in thread
From: Brendan Hide @ 2014-05-04  7:44 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: linux-btrfs

On 2014/05/04 09:24 AM, Marc MERLIN wrote:
> On Sun, May 04, 2014 at 08:57:19AM +0200, Brendan Hide wrote:
>> Hi, Marc
>>
>> Raid0 is not redundant in any way. See inline below.
>   
> Thanks for clearing things up.
>
>>> But now I have 2 questions
>>> 1) btrfs has two copies of all metadata on even a single drive, correct?
>> Only when *specifically* using -m dup (which is the default on a
>> single non-SSD device), will there be two copies of the metadata
>> stored on a single device. This is not recommended when using
> Ah, so -m dup is default like I thought, but not on SSD?
> Ooops, that means that my laptop does not have redundant metadata on its
> SSD like I thought. Thanks for the heads up.
> Ah, I see the man page now "This is because SSDs can remap blocks
> internally so duplicate blocks could end up in the same erase block
> which negates the benefits of doing metadata duplication."

You can force dup but, per the man page, whether or not that is 
beneficial is questionable.
>
>> multiple devices as it means one device failure will likely cause
>> critical loss of metadata.
> That's the part where I'm not clear:
>
> What's the difference between -m dup and -m raid1
> Don't they both say 2 copies of the metadata?
> Is -m dup only valid for a single drive, while -m raid1 for 2+ drives?

The issue is that -m dup will always put both copies on a single device. 
If you lose that device, you've lost both (all) copies of that metadata. 
With -m raid1 the second copy is on a *different* device.

I believe dup *can* be used with multiple devices but mkfs.btrfs might 
not let you do it from the get-go. The way most have gotten there is by 
having dup on a single device and then, after adding another device, 
they didn't convert the metadata to raid1.
>
>>> If so, and I have a -d raid0 -m raid0 filesystem, are both copies of the
>>> metadata on the same drive or is btrfs smart enough to spread out
>>> metadata copies so that they're not on the same drive?
>> This will mean there is only a single copy, albeit striped across
>> the drives.
> Ok, so -m raid0 only means a single copy of metadata, thanks for
> explaining.
>
>> good for redundancy. A total failure of a single device will mean
>> any large files will be lost and only files smaller than the default
>> per-disk stripe width (I believe this used to be 4K and is now 16K -
>> I could be wrong) stored only on the remaining disk will be
>> available.
>   
> Gotcha, thanks for confirming, so -m raid1 -d raid0 really only protects
> against metadata corruption or a single block loss, but otherwise if you
> lost a drive in a 2 drive raid0, you'll have lost more than just half
> your files.
>
>> The scenario you mentioned at the beginning, "if I lose a drive,
>> I'll still have full metadata for the entire filesystem and only
>> missing files" is more applicable to using "-m raid1 -d single".
>> Single is not geared towards performance and, though it doesn't
>> guarantee a file is only on a single disk, the allocation does mean
>> that the majority of all files smaller than a chunk will be stored
>> on only one disk or the other - not both.
> Ok, so in other words:
> -d raid0: if you one 1 drive out of 2, you may end up with small files
> and the rest will be lost
>
> -d single: you're more likely to have files be on one drive or the
> other, although there is no guarantee there either.
>
> Correct?

Correct
>
> Thanks,
> Marc


-- 
__________
Brendan Hide
http://swiftspirit.co.za/
http://www.webafrica.co.za/?AFF1E97


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Is metadata redundant over more than one drive with raid0 too?
  2014-05-03 23:27 Is metadata redundant over more than one drive with raid0 too? Marc MERLIN
  2014-05-04  6:57 ` Brendan Hide
@ 2014-05-04 21:49 ` Duncan
  1 sibling, 0 replies; 15+ messages in thread
From: Duncan @ 2014-05-04 21:49 UTC (permalink / raw)
  To: linux-btrfs

Marc MERLIN posted on Sat, 03 May 2014 16:27:02 -0700 as excerpted:

> So, I was thinking. In the past, I've done this:
> mkfs.btrfs -d raid0 -m raid1 -L btrfs_raid0 /dev/mapper/raid0d*
> 
> My rationale at the time was that if I lose a drive, I'll still have
> full metadata for the entire filesystem and only missing files.
> If I have raid1 with 2 drives, I should end up with 4 copies of each
> file's metadata, right?

Brendan has answered well, but sometimes a second way of putting things 
helps, especially when there was originally some misconception to clear 
up, as seems to be the case here.  So let me try to be that rewording. 
=:^)

No.  Btrfs raid1 (the multi-device metadata default) is (still only) two 
copies, as is btrfs dup (which is the single-device metadata default 
except for SSDs).  The distinction is that dup is designed for the single 
device case and puts both copies on that single device, while raid1 is 
designed for the multi-device case, and ensures that the two copies 
always go to different devices, so loss of the single device won't kill 
the metadata.

Additional details:

I am not aware of any current possibility of having more than two copies, 
no matter the mode, with a possible exception during mode conversion (say 
between raid1 and raid6), altho even then, there should be only two /
active/ copies.

Dup mode being designed for single device usage only, it's normally not 
available on multi-device filesystems.  As Brendan mentions, the way 
people sometimes get it is starting with a single-device filesystem in dup 
mode and adding devices.  If they then fail to balance-convert, old 
metadata chunks will be dup mode on the original device, while new ones 
should be created as raid1 by default.  Of course a partial balance-
convert will be just that, partial, with whatever failed to convert still 
dup mode on the original single device.

As a result, originally (and I believe still) it was impossible to 
configure dup mode on a multi-device filesystem at all.  However, someone 
did post a request that dup mode on multi-device be added as a (normally 
still heavily discouraged) option, to allow a conversion back to single-
device, without at any point dropping to non-redundant single-copy-only.  
Using the two-device raid1 to single-device dup conversion as an example, 
currently you can't btrfs device delete below two devices as that's no 
longer raid1.  Of course if both data and metadata are raid1, it's 
possible to physically disconnect one device, leaving the other as the 
only online copy but having the disconnected one in reserve, but that's 
not possible when the data is single mode, and even if it was, that 
physical disconnection will trigger read-only mode on filesystem as it's 
no longer raid1, thereby making the balance-conversion back to dup 
impossible.  And you can't balance-convert to dup on a multi-device 
filesystem, so balance-converting to single, thereby losing the 
protection of the second copy, then doing the btrfs device delete, 
becomes the only option.  Thus the request to allow balance-convert to dup 
mode on a multi-device filesystem, for the sole purpose of then allowing 
btrfs device delete of the second device, converting it back to a single-
device filesystem without ever losing second-copy redundancy protection.

Finally, for the single-device-filesystem case, dup mode is normally only 
allowed for metadata (where it is again the default, except on ssd), 
*NOT* for data.  However, someone noticed and posted that one of the side-
effects of mixed-block-group mode, used by default on filesystems under 1 
GiB but normally discouraged on filesystems above 32-64 gig for 
performance reasons, because in mixed-bg mode data and metadata share the 
same chunks, mixed-bg mode actually allows (and defaults to, except on 
SSD) dup for data as well as metadata.  There was some discussion in that 
thread as to whether that was a deliberate feature or simply an 
accidental result of the sharing.  Chris Mason confirmed it was the 
latter.  The intention has been that dup mode is a special case for 
rather critical metadata on a single device in ordered to provide better 
protection for it, and the fact that mixed-bg mode allows (indeed, even 
defaults to) dup mode for data was entirely an accident of mixed-bg mode 
implementation -- albeit one that's pretty much impossible to remove.  
But given that accident and the fact that some users do appreciate the 
ability to do dup mode data via mixed-bg mode on larger single-device 
filesystems even if it reduces performance and effectively halves storage 
space, I expect/predict that at some point, dup mode for data will be 
added as an option as well, thereby eliminating the performance impact of 
mixed-bg mode while offering single-device duplicate data redundancy on 
large filesystems, for those that value the protection such duplication 
provides, particularly given btrfs' data checksumming and integrity 
features.

> But now I have 2 questions

> 1) btrfs has two copies of all metadata on even a single drive, correct?

By default, yes, except on SSD, where dup remains an option.  But not if 
single (the default metadata mode for single-device SSD) or (for multi-
device) raid0 modes are chosen instead of dup.

> If so, and I have a -d raid0 -m raid0 filesystem, are both copies of the
> metadata on the same drive or is btrfs smart enough to spread out
> metadata copies so that they're not on the same drive?

If you specify raid0 metadata, there's no second metadata copy, on the 
same drive or elsewhere.  Further, raid0 mode stripes metadata across all 
available devices so it's even more fragmented than single mode, 
practically eliminating any chance of recovery in the event of device 
failure.

IOW, if you have raid0 metadata and a device fails or even simply does 
what would be a relatively minor temporary dropout in other raid cases, 
consider the filesystem toast.  (If you're extremely lucky and the 
dropout was temporary, such that you can recreate the raid0 with the 
dropped device, you /may/ be able to save it.  And it should drop to read-
only mode as soon as a dropped device is detected to help maximize the 
chance of that.  But don't count on it!  Simply don't use raid0 for 
anything you value at all, and you won't have to worry about it.)

> 2) does btrfs lay out files on raid0 so that files aren't striped across
> more than one drive, so that if I lose a drive, I only lose whole files,
> but not little chunks of all my files, making my entire FS toast?

No.  That's the distinction between raid0 mode and single mode.  Raid0 
mode effectively sacrifices everything else for (single thread sequential 
access) speed.  If a device drops out, consider anything that was raid0 
toast.

In theory at least, if the metadata is intact (as it should be with a 
single device drop for metadata raid1 mode), a file smaller than a single 
raid0 "strip" (the size of a stripe on a single device) may still be 
intact as well.  And as more devices are added to the raid0 stripe, 
dropping a single one does allow the lucky-case recovery file-size to 
increase as well, up to stripe-size minus strip-size for a single device 
drop-out, while also increasing the absolute chances for sub-strip-size 
files since their chances approximate N-1/N (where N is the number of 
devices in the stripe and -1 is the single device drop).

Additionally, it can be noted that if a file is small enough, btrfs may 
actually store it in metadata instead of going to the trouble of 
allocating a data chunk extent for it, and the sub-block end of a file 
may similarly be stored in metadata instead of taking another whole block 
of data.  (Reiserfs users will be familiar with this as tail-packing.)  
Of course if the metadata is dup/raid1/whatever instead of raid0/single, 
these small metadata-only-stored files should be recoverable as well.

But those are the lucky cases.  As I said above, the general rule is that 
anything on raid0 is destroyed if a device drops, so you never NEVER 
stick anything on raid0 that you value at all, and then you won't have to 
worry about it! =:^)

(Meanwhile, from experience I can say that the speed of raid0 isn't 
always as good as one might expect, either.  It does speed up the single-
thread sequential-access case as one might expect, but on today's multi-
core multi-threading many-tasking systems, single-IO-thread filesystem 
access is actually rather rare.  Then of course there's random-access as 
well.  As a result, at least for my use-case which apparently includes 
far more independent task parallel read than some, I actually found mdraid 
with its N-copies raid1 and surprisingly good parallel multi-IO-thread 
read scheduling faster than its raid0, with writes still occurring at 
normal single-device speed (unlike raid5/6 which penalizes writes) due to 
the bottleneck being the physical spinning rust.  (Obviously fast SSDs 
will change that bottleneck factor, with the individual bus to SSD speed 
usually becoming the bottleneck except for the underpowered CPU case, but 
raid1 write speed still remains reasonably close to the slowest device 
write speed in most cases.)  Of course btrfs raid1 currently limits to 
two copies and may or may not be as efficiently scheduled as md/raid1, 
but that's yet another reason why I really /really/ want N-way-mirroring 
for btrfs, since two-thread-parallel-read-access certainly beats single-
thread, but from experience I know that at least for my use-case and on 
spinning-rust, a 3-4-thread-parallel-read pattern is common enough that I 
see the benefits.  That said, I'm switching to SSD now, and the speed 
there is sufficient that I suspect I'm unlikely to see much benefit above 
3-thread-parallel and I might not actually see much from 3-thread-
parallel either.  But I'd sure like the chance to try it, and with the 
data-integrity benefits of 3-way-mirroring on btrfs as well, I'm really 
eager to see the feature introduced. =:^)

Of course the much safer and more flexible but still speedy compromise is 
raid10, which remains the general case ideal -- with the only caveat 
being the relatively high entry four-device-minimum entry cost.  (Tho 
mdraid10 does have some flexibility in that regard and can do its form of 
raid10 on fewer than four devices, at the cost of increased conceptual 
complexity and speed.)

The bottom line remains, however, don't put anything on raid0 that you 
value at all, such that you're entirely OK considering it toast and 
simply putting the remaining devices to other uses instead of even trying 
to recover, if a device drops out of the raid0.  Raid0 is optimized for 
one thing only, speed, and that in only one rather narrow and 
increasingly uncommon in the modern age use-case, single-thread-
sequential-access.  And the price it pays for that optimization is, IMO, 
very rarely worth it, tho if you have that use-case and are prepared to 
pay the cost in terms of data-loss risk, it can /indeed/ be worth it.  
Just be sure that's your use case, preferably testing a raid0 deployment 
in actual use to be sure it's giving you that extra speed, because in 
many cases, it won't, and then it's simply NOT worth the data risk cost, 
period.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Is metadata redundant over more than one drive with raid0 too?
  2014-05-04  7:24   ` Marc MERLIN
  2014-05-04  7:44     ` Brendan Hide
@ 2014-05-05  0:46     ` Daniel Lee
  2014-05-05  5:06       ` Marc MERLIN
  1 sibling, 1 reply; 15+ messages in thread
From: Daniel Lee @ 2014-05-05  0:46 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: Brendan Hide, linux-btrfs

On 05/04/2014 12:24 AM, Marc MERLIN wrote:
>  
> Gotcha, thanks for confirming, so -m raid1 -d raid0 really only protects
> against metadata corruption or a single block loss, but otherwise if you
> lost a drive in a 2 drive raid0, you'll have lost more than just half
> your files.
>
>> The scenario you mentioned at the beginning, "if I lose a drive,
>> I'll still have full metadata for the entire filesystem and only
>> missing files" is more applicable to using "-m raid1 -d single".
>> Single is not geared towards performance and, though it doesn't
>> guarantee a file is only on a single disk, the allocation does mean
>> that the majority of all files smaller than a chunk will be stored
>> on only one disk or the other - not both.
> Ok, so in other words:
> -d raid0: if you one 1 drive out of 2, you may end up with small files
> and the rest will be lost
>
> -d single: you're more likely to have files be on one drive or the
> other, although there is no guarantee there either.
>
> Correct?
>
> Thanks,
> Marc
This often seems to confuse people and I think there is a common
misconception that the btrfs raid/single/dup features work at the file
level when in reality they work at a level closer to lvm/md.

If someone told you that they lost a device out of a jbod or multi disk
lvm group(somewhat analogous to -d single) with ext on top you would
expect them to lose data in any file that had a fragment in the lost
region (lets ignore metadata for a moment). This is potentially up to
100% of the files but this should not be a surprising result. Similarly,
someone who has lost a disk out of a md/lvm raid0 volume should not be
surprised to have a hard time recovering any data at all from it.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Is metadata redundant over more than one drive with raid0 too?
  2014-05-04  7:44     ` Brendan Hide
@ 2014-05-05  1:27       ` Marc MERLIN
  2014-05-06 19:05         ` Duncan
  2014-05-06 19:39         ` Duncan
  0 siblings, 2 replies; 15+ messages in thread
From: Marc MERLIN @ 2014-05-05  1:27 UTC (permalink / raw)
  To: Brendan Hide, Duncan; +Cc: linux-btrfs

On Sun, May 04, 2014 at 09:44:41AM +0200, Brendan Hide wrote:
> >Ah, I see the man page now "This is because SSDs can remap blocks
> >internally so duplicate blocks could end up in the same erase block
> >which negates the benefits of doing metadata duplication."
> 
> You can force dup but, per the man page, whether or not that is
> beneficial is questionable.

So the reason I was confused originally was this:
legolas:~# btrfs fi df /mnt/btrfs_pool1
Data, single: total=734.01GiB, used=435.39GiB
System, DUP: total=8.00MiB, used=96.00KiB
System, single: total=4.00MiB, used=0.00
Metadata, DUP: total=8.50GiB, used=6.74GiB
Metadata, single: total=8.00MiB, used=0.00

This is on my laptop with an SSD. Clearly btrfs is using duplicate
metadata on an SSD, and I did not ask it to do so.
Note that I'm still generally happy with the idea of duplicate metadata
on an SSD even if it's not bulletproof.

> >What's the difference between -m dup and -m raid1
> >Don't they both say 2 copies of the metadata?
> >Is -m dup only valid for a single drive, while -m raid1 for 2+ drives?
> 
> The issue is that -m dup will always put both copies on a single
> device. If you lose that device, you've lost both (all) copies of
> that metadata. With -m raid1 the second copy is on a *different*
> device.

Aaah, that explains it now, thanks. So -m dup is indeed kind of stupid
if you have more than one drive.
 
> I believe dup *can* be used with multiple devices but mkfs.btrfs
> might not let you do it from the get-go. The way most have gotten
> there is by having dup on a single device and then, after adding
> another device, they didn't convert the metadata to raid1.

Right, that also makes sense.

> >-d raid0: if you one 1 drive out of 2, you may end up with small files
> >and the rest will be lost
> >
> >-d single: you're more likely to have files be on one drive or the
> >other, although there is no guarantee there either.
> >
> >Correct?
> 
> Correct

Thanmks :)

On Sun, May 04, 2014 at 09:49:24PM +0000, Duncan wrote:
> Brendan has answered well, but sometimes a second way of putting things 
> helps, especially when there was originally some misconception to clear 
> up, as seems to be the case here.  So let me try to be that rewording. 
> =:^)

Sure, that can always help.
 
> No.  Btrfs raid1 (the multi-device metadata default) is (still only) two 
> copies, as is btrfs dup (which is the single-device metadata default 
> except for SSDs).  The distinction is that dup is designed for the single 
> device case and puts both copies on that single device, while raid1 is 
> designed for the multi-device case, and ensures that the two copies 
> always go to different devices, so loss of the single device won't kill 
> the metadata.

Yep, I got that now.

> Dup mode being designed for single device usage only, it's normally not 
> available on multi-device filesystems.  As Brendan mentions, the way 
> people sometimes get it is starting with a single-device filesystem in dup 
> mode and adding devices.  If they then fail to balance-convert, old 
> metadata chunks will be dup mode on the original device, while new ones 
> should be created as raid1 by default.  Of course a partial balance-
> convert will be just that, partial, with whatever failed to convert still 
> dup mode on the original single device.

Yes, that makes sense too.
 
> Finally, for the single-device-filesystem case, dup mode is normally only 
> allowed for metadata (where it is again the default, except on ssd), 
> *NOT* for data.  However, someone noticed and posted that one of the side-
> effects of mixed-block-group mode, used by default on filesystems under 1 
> GiB but normally discouraged on filesystems above 32-64 gig for 
> performance reasons, because in mixed-bg mode data and metadata share the 
> same chunks, mixed-bg mode actually allows (and defaults to, except on 
> SSD) dup for data as well as metadata.  There was some discussion in that 

Yes, I read that. That's an interesting side effect which could be used
in some cases.

> thread as to whether that was a deliberate feature or simply an 
> accidental result of the sharing.  Chris Mason confirmed it was the 
> latter.  The intention has been that dup mode is a special case for 
> rather critical metadata on a single device in ordered to provide better 
> protection for it, and the fact that mixed-bg mode allows (indeed, even 
> defaults to) dup mode for data was entirely an accident of mixed-bg mode 
> implementation -- albeit one that's pretty much impossible to remove.  
> But given that accident and the fact that some users do appreciate the 
> ability to do dup mode data via mixed-bg mode on larger single-device 
> filesystems even if it reduces performance and effectively halves storage 
> space, I expect/predict that at some point, dup mode for data will be 
> added as an option as well, thereby eliminating the performance impact of 
> mixed-bg mode while offering single-device duplicate data redundancy on 
> large filesystems, for those that value the protection such duplication 
> provides, particularly given btrfs' data checksumming and integrity 
> features.

This would indeed be nice for some uses, great to know.
 
(...)
> No.  That's the distinction between raid0 mode and single mode.  Raid0 
> mode effectively sacrifices everything else for (single thread sequential 
> access) speed.  If a device drops out, consider anything that was raid0 
> toast.

Thanks for confirming.

> But those are the lucky cases.  As I said above, the general rule is that 
> anything on raid0 is destroyed if a device drops, so you never NEVER 
> stick anything on raid0 that you value at all, and then you won't have to 
> worry about it! =:^)
 
That's correct.
The original reason why I was asking myself this question and trying to
figure out how much better 
-m raid1 -d raid0
was over
-m raid0 -d raid0

I think the summary is that in the first case, you're going to to be
abel to recover all/most small files (think maildir) if you lose one
device, whereas in the 2nd case, with half the metadata missing, your FS
is pretty much fully gone.
Fair to say that?

Now, if I don't care about speed, but wouldn't mind recovering a few
bits should something happen (actually in my case mostly knowing the
state of the filesystem when a drive was lost so that I can see how
many new files showed up since my last backup), it sounds like it
wouldn't be bad to use:
-m raid1 -d linear

This will not give me the speed boost from raid0 which I don't care
about, it will give me metadata redundancy, and due to linear, there is
a decent chance that half my files are intact on the remaining drive
(depending on their size apparently).


(snip)
> sequential-access.  And the price it pays for that optimization is, IMO, 
> very rarely worth it, tho if you have that use-case and are prepared to 
> pay the cost in terms of data-loss risk, it can /indeed/ be worth it.  
> Just be sure that's your use case, preferably testing a raid0 deployment 
> in actual use to be sure it's giving you that extra speed, because in 
> many cases, it won't, and then it's simply NOT worth the data risk cost, 

So one place I use it is not for speed but for one FS that gives me more
space without redundancy (rotating buffer streaming video from security
cams).
At the time I used -m raid1 -d raid0, but it sounds for slightly extra
recoverability, I should have ued -m raid1 -d linear (and yes, I
undertand that one should not consider a -d linear recoverable when a
drive went missing).

Thanks for going through those scenario with me :)

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Is metadata redundant over more than one drive with raid0 too?
  2014-05-05  0:46     ` Daniel Lee
@ 2014-05-05  5:06       ` Marc MERLIN
  2014-05-06 17:16         ` Duncan
  0 siblings, 1 reply; 15+ messages in thread
From: Marc MERLIN @ 2014-05-05  5:06 UTC (permalink / raw)
  To: Daniel Lee; +Cc: Brendan Hide, linux-btrfs

On Sun, May 04, 2014 at 05:46:00PM -0700, Daniel Lee wrote:
> This often seems to confuse people and I think there is a common
> misconception that the btrfs raid/single/dup features work at the file
> level when in reality they work at a level closer to lvm/md.
> 
> If someone told you that they lost a device out of a jbod or multi disk
> lvm group(somewhat analogous to -d single) with ext on top you would
> expect them to lose data in any file that had a fragment in the lost
> region (lets ignore metadata for a moment). This is potentially up to
> 100% of the files but this should not be a surprising result. Similarly,
> someone who has lost a disk out of a md/lvm raid0 volume should not be
> surprised to have a hard time recovering any data at all from it.

That's true, but in this case I barely see the point of -m single vs -m
raid0. It sounds like they both stripe data anyway, maybe not at the
same level, but if both are striped, than they're almost the same in my
book :)

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Is metadata redundant over more than one drive with raid0 too?
  2014-05-05  5:06       ` Marc MERLIN
@ 2014-05-06 17:16         ` Duncan
  2014-05-07  8:18           ` raid0 vs single, and should we allow -mdup by default on SSDs? Marc MERLIN
  0 siblings, 1 reply; 15+ messages in thread
From: Duncan @ 2014-05-06 17:16 UTC (permalink / raw)
  To: linux-btrfs

Marc MERLIN posted on Sun, 04 May 2014 22:06:17 -0700 as excerpted:

> That's true, but in this case I barely see the point of -m single vs -m
> raid0. It sounds like they both stripe data anyway, maybe not at the
> same level, but if both are striped, than they're almost the same in my
> book :)

Single only stripes in such extremely large (1 GiB data, quarter-GiB 
metadata, per strip) chunks that it doesn't matter for speed, and then 
only as a result of its chunk allocation policy.  If one can define such 
large strips as striping, which it is in a way, but not really in the 
practical sense.

The effect of a lost device, then, is more or less random, tho for single 
metadata the effect is likely to be quite large up to total loss, due to 
the damage to the tree.  It's not out of thin air that the multi-device 
metadata default is raid1 (which unlike the single-device case, should be 
the same on SSD or spinning rust, since by definition the copies will be 
on different devices and thus cannot be affected by SSDs' FTL-level de-
dup).

So the below assumes copies=2 raid1 metadata and is thus only considering 
single vs. raid0 data.

For single data, only files that happened to be partially allocated on 
the lost device will be damaged.  For file sizes above the 1 GiB data 
chunk size, the chance of damage is therefore rather high, as by 
definition the file will require multiple chunks and the chances of one 
of them being on the lost device go up accordingly.  But for file sizes 
significantly under 1 GiB, where data fragmentation is relatively low at 
least (think a recent rebalance or (auto)defrag), relatively small files 
are very likely to be located on a single chunk and thus either all there 
or all missing, depending on whether that chunk was on the missing device 
or not.

That contrasts with raid0, where the striping is at sizes well under a 
chunk (memory page size or 4 MiB on x86/amd64 data I believe, tho the 
fact that files under the 16 MiB node size may actually be entirely 
folded into metadata and not have a data extent allocation at all skews 
things for up to the 16 MiB metadata node size), so the definition of 
"small file likely to be recovered" is **MUCH** smaller on raid0, than on 
single.

Effectively, raid0 data you're only (relatively) likely to recover files 
smaller than 16 MiB, while single data, it's files smaller than 1 GiB.

Big difference!

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Is metadata redundant over more than one drive with raid0 too?
  2014-05-05  1:27       ` Marc MERLIN
@ 2014-05-06 19:05         ` Duncan
  2014-05-06 19:39         ` Duncan
  1 sibling, 0 replies; 15+ messages in thread
From: Duncan @ 2014-05-06 19:05 UTC (permalink / raw)
  To: linux-btrfs

Marc MERLIN posted on Sun, 04 May 2014 18:27:19 -0700 as excerpted:

> On Sun, May 04, 2014 at 09:44:41AM +0200, Brendan Hide wrote:
>> >Ah, I see the man page now "This is because SSDs can remap blocks
>> >internally so duplicate blocks could end up in the same erase block
>> >which negates the benefits of doing metadata duplication."
>> 
>> You can force dup but, per the man page, whether or not that is
>> beneficial is questionable.
> 
> So the reason I was confused originally was this:
> legolas:~# btrfs fi df /mnt/btrfs_pool1
> Data, single: total=734.01GiB, used=435.39GiB
> System, DUP: total=8.00MiB, used=96.00KiB
> System, single: total=4.00MiB, used=0.00
> Metadata, DUP: total=8.50GiB, used=6.74GiB
> Metadata, single: total=8.00MiB, used=0.00
> 
> This is on my laptop with an SSD. Clearly btrfs is using duplicate
> metadata on an SSD, and I did not ask it to do so.
> Note that I'm still generally happy with the idea of duplicate metadata
> on an SSD even if it's not bulletproof.

In regard to metadata defaulting to single rather than the (otherwise) dup 
on single-device ssd:

1) In ordered to do that, btrfs (I guess mkfs.btrfs in this case) must be 
able to detect that the device *IS* ssd.  Depending on the SSD, the 
kernel version, and whether the btrfs is being created direct on bare-
metal device or on some device layered (lvm or dmcrypt or whatever) on 
top of the bare metal, btrfs may or may not successfully detect that.

Obviously in your case[1] the ssd wasn't detected.

Question:  Does btrfs detect ssd and automatically add it to the mount 
options for that btrfs?  I suspect not, thus consistent behavior in not 
detecting the SSD.  FWIW, it is detected here.  I've never specifically 
added ssd to any of my btrfs mount options, but it's always there in 
/proc/self/mounts when I check.[2]

I believe I've seen you mention using dmcrypt or the like, however, which 
probably doesn't pass whatever is used for ssd protection on thru, thus 
explaining btrfs not seeing it and having to specify it yourself, if you 
wish.

While I'm not sure, I /think/ btrfs may use the sysfs rotational file (or 
rather, the same information that the kernel exports to that file) for 
this detection.  For my bare-metal devices that's:

/sys/block/sdX/queue/rotational

For my ssds that file contains "0" while for spinning rust, it contains 
"1".

The contents of that file are derived in turn from the information 
exported by the device.  I believe the same information can be seen with 
hdparm -I, in the Configuration section, as Nominal Media Rotation Rate.

For my spinning rust that returns an RPM value such as 7200.  For my sdds 
it returns "Solid State Device".

The same information can be seen with smartctl -i, which has much shorter 
output so it's easier to find.  Look for Rotation Rate.

Again, my ssds report "Solid State Device", while my spinning rust 
reports a value such as "7200 rpm".

2) The only reason I happen to know about the SSD metadata single-device 
single mode default exception (where metadata otherwise defaults to dup 
mode on single-device, and to raid1 mode on multi-device regardless of 
the media), is as a result of I believe Chris Mason commenting on it in 
an on-list reply.

The reasoning given in that reply was not the erase-block reason I've 
seen someone else mention here (and which doesn't quite make sense to me, 
since I don't know why that would make a difference), but rather:

Some SSD firmware does automatic deduplication and compression.  On these 
devices, DUP-mode would almost certainly be stored as a single internal 
data block with two external address references anyway, so it would 
actually be single in any case, and defaulting to single (a) doesn't hide 
that fact, and (b) reduces overhead that's justified for safety 
otherwise, but if the firmware is doing an end run around that safety 
anyway, might as well just shortcut the overhead as well.

However, while the btrfs default will apply to all (detected) ssds, not 
all ssds have firmware that does this internal deduplication!

In fact, the documentation for my ssds sells its LACK of such compression 
and deduplication as a feature, pointing out that such features tend to 
make the behavior of a device far less predictable[3], tho they do 
increase maximum speed and capacity.

Which is why I've chosen to specify dup mode on my single-device btrfs 
here, even on ssds.[4]  While it'd be the wrong choice on ssds that do 
compression and deduplication, on mine, it's still the right choice. =:^)

If your SSDs don't do firmware-based dedup/compression, then dup metadata 
is still arguably the best choice on ssd.  But if they do, the single 
metadata default does indeed make more sense, even if that's not the 
default you're getting due to lack of ssd detection.

---
[1] Obviously ssd not detected: Assuming you didn't specify metadata 
level, probably a safe assumption or we'd not be having the discussion.  
Personally, I always make a point of specifying both data and metadata 
level here when doing a mkfs.btrfs, just to be sure.

[2] My btrfs are all on SSD.  I'm still using legacy reiserfs on my 
legacy spinning rust, but reiserfs' journaling behavior isn't appropriate 
for ssd, so where I've upgraded to ssd I use btrfs.  Which works out 
great since the spinning rust is backup for the ssds, and the very mature 
reiserfs is backup for the still under heavy development btrfs. =:^)

[3] Compression/deduplication performance:  Indeed, both the speed and 
capacity of devices with compression and deduplication varies greatly 
depending on the compressibility of the data, tho maximum speed and 
capacity is certainly greater, but it's not easily predictable.

[4] Most of my btrfs are raid1 mode across two devices.  I do have a 
couple single-device btrfs, however, /boot and its backup on the other 
device, instead of the usual raid1 mode across both devices but with a 
second raid1 btrfs primary backup of the first on a second set of 
partitions across the same devices, with the /boot exception being 
because it's a lot easier to tell the BIOS to boot from the other device 
and thus select the backup /that/ way, than it is to tell grub to use a 
different /boot!  Altho with grub2, it's actually possible to have it 
select the /boot too, but the BIOs selector method is stiff easier.

Of course being /boot and its backup, those single-device btrfs are both 
quite small, 256 MiB each, and I use mixed-bg (-M in mkfs.btrfs) mode for 
them as a result.  That means I dup both data and metadata at the same 
time, since their mixed together, which in turn means the effective 
filesystem capacity is half the filesystem size, 128 MiB instead of 256.  
But 128 MiB is fine for /boot.  I just have to track the number of 
kernels (with attached initramfs on each one, dramatically increasing the 
individual kernel size) I have available a bit closer and delete them 
sooner than I might otherwise, plus watch the btrfs fi show output a bit 
more closely and do a balance when unallocated gets too low.  But I still 
have room to track a couple stable kernels, plus a dozen or so pre-
releases when I'm bisecting a kernel bug, before I have to start deleting.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Is metadata redundant over more than one drive with raid0 too?
  2014-05-05  1:27       ` Marc MERLIN
  2014-05-06 19:05         ` Duncan
@ 2014-05-06 19:39         ` Duncan
  1 sibling, 0 replies; 15+ messages in thread
From: Duncan @ 2014-05-06 19:39 UTC (permalink / raw)
  To: linux-btrfs

Marc MERLIN posted on Sun, 04 May 2014 18:27:19 -0700 as excerpted:

> The original reason why I was asking myself this question and trying to
> figure out how much better -m raid1 -d raid0 was over -m raid0 -d raid0
> 
> I think the summary is that in the first case, you're going to to be
> abel to recover all/most small files (think maildir) if you lose one
> device, whereas in the 2nd case, with half the metadata missing, your FS
> is pretty much fully gone.
> Fair to say that?

Yes. =:^)

> Now, if I don't care about speed, but wouldn't mind recovering a few
> bits should something happen (actually in my case mostly knowing the
> state of the filesystem when a drive was lost so that I can see how many
> new files showed up since my last backup), it sounds like it wouldn't be
> bad to use:
> -m raid1 -d linear

Well, assuming that by -d linear you meant -d single. Btrfs doesn't call 
it linear, tho at the data safety level, btrfs single is actually quite 
comparable to mdadm linear.  =:^)  

(I had to check.  I knew I didn't remember btrfs having linear as an 
option, and hadn't seen any patches float by on the list that would add 
it, but since I'm not a dev I don't follow patches /that/ closely, and 
thought I might have missed it.  So I thought I better go check to see 
what this possible new linear option actually was, if indeed I had missed 
it.  Turns out I didn't miss it after all; there's still no linear option 
that I can see, unless it's there and simply not documented.  =:^)

> This will not give me the speed boost from raid0 which I don't care
> about, it will give me metadata redundancy, and due to linear, there is
> a decent chance that half my files are intact on the remaining drive
> (depending on their size apparently).

Yes. =:^)

> So one place I use it is not for speed but for one FS that gives me more
> space without redundancy (rotating buffer streaming video from security
> cams).
> At the time I used -m raid1 -d raid0, but it sounds for slightly extra
> recoverability, I should have ued -m raid1 -d linear (and yes, I
> undertand that one should not consider a -d linear recoverable when a
> drive went missing).

That appears to be a very good use of either -d raid0 or -d single, yes.  
And since you're apparently not streaming such high resolution video that 
you NEED the raid0, single does indeed give you a somewhat better chance 
at recovery.

Tho with streaming video I wonder what your filesizes are as video files 
tend to be pretty big.  If they're over the 1 GiB btrfs data chunk size, 
particularly if you're only running a two-device btrfs, you'd probably 
lose near all files anyway.

Assuming single data mode and file sizes between a GiB and 2 GiB, 
statistically you should lose near 100% on a two device btrfs with one 
dropping out, 67% on a three device btrfs with a single device dropout, 
50% on four devices, 40% on five devices...

If file sizes are 2-3 GiB, you should lose near 100% on 2-3 devices, 75% 
on four devices, 60% on five, 50% on six...

With raid0 data stats would be similar but I believe starting at 16 MiB 
with 4 MiB intervals.  Due to many files under 16 MiB being stored in the 
metadata, you'd lose few of them, but that'd jump to 100% loss at 16 MiB 
until you had 5+ devices in the raid0, with 16-20 MiB file loss chance on 
a 5-device raid0 80%, since chances would be 80% of one strip of the 
stripe being on the lost device.  (That's assuming my 4 MiB strip size 
assumption is correct, it could be smaller than that, possibly 64 KiB.)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: raid0 vs single, and should we allow -mdup by default on SSDs?
  2014-05-06 17:16         ` Duncan
@ 2014-05-07  8:18           ` Marc MERLIN
  2014-05-07  8:29             ` Hugo Mills
  0 siblings, 1 reply; 15+ messages in thread
From: Marc MERLIN @ 2014-05-07  8:18 UTC (permalink / raw)
  To: Duncan; +Cc: linux-btrfs

Hi Chris and other devs,

Does it really make sense to turn off -mdup on SSDs? I would argue that
no. In my case dmcrypt protected me from that, so I'm happy, but even if
I didn't use it, I'd want the protection of -mdup, even if the
protection mght only be partial.

On Tue, May 06, 2014 at 05:16:08PM +0000, Duncan wrote:
> Single only stripes in such extremely large (1 GiB data, quarter-GiB 
> metadata, per strip) chunks that it doesn't matter for speed, and then 
> only as a result of its chunk allocation policy.  If one can define such 
> large strips as striping, which it is in a way, but not really in the 
> practical sense.
 
Oh good, I didn't know it was that big.

> The effect of a lost device, then, is more or less random, tho for single 
> metadata the effect is likely to be quite large up to total loss, due to 
> the damage to the tree.  It's not out of thin air that the multi-device 

Yes. I totally use either -mdup or -mraid1.

> That contrasts with raid0, where the striping is at sizes well under a 
> chunk (memory page size or 4 MiB on x86/amd64 data I believe, tho the 
> fact that files under the 16 MiB node size may actually be entirely 
> folded into metadata and not have a data extent allocation at all skews 
> things for up to the 16 MiB metadata node size), so the definition of 
> "small file likely to be recovered" is **MUCH** smaller on raid0, than on 
> single.

Great to know, I'll use -m raid1 -d single next time.

> Effectively, raid0 data you're only (relatively) likely to recover files 
> smaller than 16 MiB, while single data, it's files smaller than 1 GiB.

Thanks much for that.

On Tue, May 06, 2014 at 07:05:52PM +0000, Duncan wrote:
> 1) In ordered to do that, btrfs (I guess mkfs.btrfs in this case) must be 
> able to detect that the device *IS* ssd.  Depending on the SSD, the 
> kernel version, and whether the btrfs is being created direct on bare-
> metal device or on some device layered (lvm or dmcrypt or whatever) on 
> top of the bare metal, btrfs may or may not successfully detect that.
> 
> Obviously in your case[1] the ssd wasn't detected.

Indeed.  I also found out why my SSD has -mdup: It's on top of dmcrypt
so btrfs failed to see it was and SSD and gave me -mdup. Good, that's
what I wanted anyway :)
 
> I believe I've seen you mention using dmcrypt or the like, however, which 
> probably doesn't pass whatever is used for ssd protection on thru, thus 
> explaining btrfs not seeing it and having to specify it yourself, if you 
> wish.
 
You guessed correctly, congrats.

> 2) The only reason I happen to know about the SSD metadata single-device 
> single mode default exception (where metadata otherwise defaults to dup 
> mode on single-device, and to raid1 mode on multi-device regardless of 
> the media), is as a result of I believe Chris Mason commenting on it in 
> an on-list reply.
> The reasoning given in that reply was not the erase-block reason I've 
> seen someone else mention here (and which doesn't quite make sense to me, 
> since I don't know why that would make a difference), but rather:

Yes. I personally don't think it's a good idea. Basically when having 2
copies, they could still end up on the same erase block, making them
less redundant.
My answer to that is 'so what?'
There are plenty of other times where dup would be useful on an SSD. I
really don't see the point of trying to it off by default just because
maybe in one case it would not offer extra protection.

> Some SSD firmware does automatic deduplication and compression.  On these 
> devices, DUP-mode would almost certainly be stored as a single internal 
> data block with two external address references anyway, so it would 
> actually be single in any case, and defaulting to single (a) doesn't hide 
> that fact, and (b) reduces overhead that's justified for safety 
> otherwise, but if the firmware is doing an end run around that safety 
> anyway, might as well just shortcut the overhead as well.

If some SSDs do this, let's not punish those have SSDs that don't.
 
> However, while the btrfs default will apply to all (detected) ssds, not 
> all ssds have firmware that does this internal deduplication!

Exactly.

On Tue, May 06, 2014 at 07:39:12PM +0000, Duncan wrote:
> Well, assuming that by -d linear you meant -d single. Btrfs doesn't call 
> it linear, tho at the data safety level, btrfs single is actually quite 
> comparable to mdadm linear.  =:^)  

Yes, I meant single, sorry :)
(aka linear for mdadm)
 
> > At the time I used -m raid1 -d raid0, but it sounds for slightly extra
> > recoverability, I should have ued -m raid1 -d linear (and yes, I
> > undertand that one should not consider a -d linear recoverable when a
> > drive went missing).
> 
> That appears to be a very good use of either -d raid0 or -d single, yes.  
> And since you're apparently not streaming such high resolution video that 
> you NEED the raid0, single does indeed give you a somewhat better chance 
> at recovery.
 
zoneminder saves 'video' as a stream of independent small jpegs, so I'm
good. Actually come to think of it they're so small that they probably
all ended up in the raid1 metadata. That also means that I'm not getting
twice the storage space like I planned to. Oh well...

Thanks for all the answers.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: raid0 vs single, and should we allow -mdup by default on SSDs?
  2014-05-07  8:18           ` raid0 vs single, and should we allow -mdup by default on SSDs? Marc MERLIN
@ 2014-05-07  8:29             ` Hugo Mills
  2014-05-07  8:52               ` Marc MERLIN
  0 siblings, 1 reply; 15+ messages in thread
From: Hugo Mills @ 2014-05-07  8:29 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: Duncan, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 1128 bytes --]

On Wed, May 07, 2014 at 01:18:40AM -0700, Marc MERLIN wrote:
> On Tue, May 06, 2014 at 07:39:12PM +0000, Duncan wrote:
> > That appears to be a very good use of either -d raid0 or -d single, yes.  
> > And since you're apparently not streaming such high resolution video that 
> > you NEED the raid0, single does indeed give you a somewhat better chance 
> > at recovery.
>  
> zoneminder saves 'video' as a stream of independent small jpegs, so I'm
> good. Actually come to think of it they're so small that they probably
> all ended up in the raid1 metadata. That also means that I'm not getting
> twice the storage space like I planned to. Oh well...

   There's a mount option to change the threshold at which files are
inlined in metadata: maxinline=<bytes>. You could play with that for
this particular use-case.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- I am but mad north-north-west:  when the wind is southerly, I ---  
                       know a hawk from a handsaw.                       

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: raid0 vs single, and should we allow -mdup by default on SSDs?
  2014-05-07  8:29             ` Hugo Mills
@ 2014-05-07  8:52               ` Marc MERLIN
  2014-05-07 22:39                 ` Mitch Harder
  0 siblings, 1 reply; 15+ messages in thread
From: Marc MERLIN @ 2014-05-07  8:52 UTC (permalink / raw)
  To: Hugo Mills, Duncan, linux-btrfs

On Wed, May 07, 2014 at 09:29:41AM +0100, Hugo Mills wrote:
> On Wed, May 07, 2014 at 01:18:40AM -0700, Marc MERLIN wrote:
> > On Tue, May 06, 2014 at 07:39:12PM +0000, Duncan wrote:
> > > That appears to be a very good use of either -d raid0 or -d single, yes.  
> > > And since you're apparently not streaming such high resolution video that 
> > > you NEED the raid0, single does indeed give you a somewhat better chance 
> > > at recovery.
> >  
> > zoneminder saves 'video' as a stream of independent small jpegs, so I'm
> > good. Actually come to think of it they're so small that they probably
> > all ended up in the raid1 metadata. That also means that I'm not getting
> > twice the storage space like I planned to. Oh well...
> 
>    There's a mount option to change the threshold at which files are
> inlined in metadata: maxinline=<bytes>. You could play with that for
> this particular use-case.

Oh cool, thank you.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: raid0 vs single, and should we allow -mdup by default on SSDs?
  2014-05-07  8:52               ` Marc MERLIN
@ 2014-05-07 22:39                 ` Mitch Harder
  0 siblings, 0 replies; 15+ messages in thread
From: Mitch Harder @ 2014-05-07 22:39 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: Hugo Mills, Duncan, linux-btrfs

On Wed, May 7, 2014 at 3:52 AM, Marc MERLIN <marc@merlins.org> wrote:
> On Wed, May 07, 2014 at 09:29:41AM +0100, Hugo Mills wrote:
>> On Wed, May 07, 2014 at 01:18:40AM -0700, Marc MERLIN wrote:
>> > On Tue, May 06, 2014 at 07:39:12PM +0000, Duncan wrote:
>> > > That appears to be a very good use of either -d raid0 or -d single, yes.
>> > > And since you're apparently not streaming such high resolution video that
>> > > you NEED the raid0, single does indeed give you a somewhat better chance
>> > > at recovery.
>> >
>> > zoneminder saves 'video' as a stream of independent small jpegs, so I'm
>> > good. Actually come to think of it they're so small that they probably
>> > all ended up in the raid1 metadata. That also means that I'm not getting
>> > twice the storage space like I planned to. Oh well...
>>
>>    There's a mount option to change the threshold at which files are
>> inlined in metadata: maxinline=<bytes>. You could play with that for
>> this particular use-case.
>
> Oh cool, thank you.
>

Since each non-inlined file will occupy a minimum of 4k, you may find
that inlining will still save space even if it is duplicated.

Even if they are duplicated in the metadata under RAID1, inlining a
bunch of 256 byte files will still be more space efficient than
storing them as regular files.

But if most of the files are in the 2k-3k range, you may be more
efficient to store them as files.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2014-05-07 22:39 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-03 23:27 Is metadata redundant over more than one drive with raid0 too? Marc MERLIN
2014-05-04  6:57 ` Brendan Hide
2014-05-04  7:24   ` Marc MERLIN
2014-05-04  7:44     ` Brendan Hide
2014-05-05  1:27       ` Marc MERLIN
2014-05-06 19:05         ` Duncan
2014-05-06 19:39         ` Duncan
2014-05-05  0:46     ` Daniel Lee
2014-05-05  5:06       ` Marc MERLIN
2014-05-06 17:16         ` Duncan
2014-05-07  8:18           ` raid0 vs single, and should we allow -mdup by default on SSDs? Marc MERLIN
2014-05-07  8:29             ` Hugo Mills
2014-05-07  8:52               ` Marc MERLIN
2014-05-07 22:39                 ` Mitch Harder
2014-05-04 21:49 ` Is metadata redundant over more than one drive with raid0 too? Duncan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.